Compare commits

..

1 Commits
main ... my-ssh

Author SHA1 Message Date
ydshieh 994eb702bb run ssh 2024-05-28 10:09:20 +02:00
245 changed files with 12494 additions and 4233 deletions

View File

@ -98,7 +98,7 @@ jobs:
fetch_all_tests:
working_directory: ~/transformers
docker:
- image: huggingface/transformers-quality
- image: huggingface/transformers-consistency
parallelism: 1
steps:
- checkout

View File

@ -17,50 +17,50 @@ body:
description: |
Your issue will be replied to more quickly if you can figure out the right person to tag with @
If you know how to use git blame, that is the easiest way, otherwise, here is a rough guide of **who to tag**.
All issues are read by one of the core maintainers, so if you don't know who to tag, just leave this blank and
a core maintainer will ping the right person.
Please tag fewer than 3 people.
Models:
- text models: @ArthurZucker and @younesbelkada
- vision models: @amyeroberts
- speech models: @sanchit-gandhi
- graph models: @clefourrier
Library:
- flax: @sanchit-gandhi
- generate: @zucchini-nlp (visual-language models) or @gante (all others)
- generate: @gante
- pipelines: @Narsil
- tensorflow: @gante and @Rocketknight1
- tokenizers: @ArthurZucker
- trainer: @muellerzr @SunMarc
- trainer: @muellerzr and @pacman100
Integrations:
- deepspeed: HF Trainer/Accelerate: @muellerzr
- deepspeed: HF Trainer/Accelerate: @pacman100
- ray/raytune: @richardliaw, @amogkam
- Big Model Inference: @SunMarc
- quantization (bitsandbytes, autogpt): @SunMarc and @younesbelkada
Documentation: @stevhliu
Model hub:
- for issues with a model, report at https://discuss.huggingface.co/ and tag the model's creator.
HF projects:
- accelerate: [different repo](https://github.com/huggingface/accelerate)
- datasets: [different repo](https://github.com/huggingface/datasets)
- diffusers: [different repo](https://github.com/huggingface/diffusers)
- rust tokenizers: [different repo](https://github.com/huggingface/tokenizers)
Maintained examples (not research project or legacy):
- Flax: @sanchit-gandhi
- PyTorch: See Models above and tag the person corresponding to the modality of the example.
- TensorFlow: @Rocketknight1
@ -101,11 +101,11 @@ body:
placeholder: |
Steps to reproduce the behavior:
1.
2.
3.
- type: textarea
id: expected-behavior

View File

@ -47,15 +47,15 @@ Models:
Library:
- flax: @sanchit-gandhi
- generate: @zucchini-nlp (visual-language models) or @gante (all others)
- generate: @gante
- pipelines: @Narsil
- tensorflow: @gante and @Rocketknight1
- tokenizers: @ArthurZucker
- trainer: @muellerzr and @SunMarc
- trainer: @muellerzr and @pacman100
Integrations:
- deepspeed: HF Trainer/Accelerate: @muellerzr
- deepspeed: HF Trainer/Accelerate: @pacman100
- ray/raytune: @richardliaw, @amogkam
- Big Model Inference: @SunMarc
- quantization (bitsandbytes, autogpt): @SunMarc and @younesbelkada

View File

@ -70,6 +70,16 @@ jobs:
name: "Latest PyTorch + DeepSpeed"
runs-on: [intel-cpu, 8-cpu, ci]
steps:
- name: Cleanup disk
run: |
sudo ls -l /usr/local/lib/
sudo ls -l /usr/share/
sudo du -sh /usr/local/lib/
sudo du -sh /usr/share/
sudo rm -rf /usr/local/lib/android
sudo rm -rf /usr/share/dotnet
sudo du -sh /usr/local/lib/
sudo du -sh /usr/share/
-
name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
@ -106,6 +116,16 @@ jobs:
name: "Latest PyTorch + DeepSpeed (Push CI - Daily Build)"
runs-on: [intel-cpu, 8-cpu, ci]
steps:
- name: Cleanup disk
run: |
sudo ls -l /usr/local/lib/
sudo ls -l /usr/share/
sudo du -sh /usr/local/lib/
sudo du -sh /usr/share/
sudo rm -rf /usr/local/lib/android
sudo rm -rf /usr/share/dotnet
sudo du -sh /usr/local/lib/
sudo du -sh /usr/share/
-
name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
@ -182,6 +202,16 @@ jobs:
if: inputs.image_postfix != '-push-ci'
runs-on: [intel-cpu, 8-cpu, ci]
steps:
- name: Cleanup disk
run: |
sudo ls -l /usr/local/lib/
sudo ls -l /usr/share/
sudo du -sh /usr/local/lib/
sudo du -sh /usr/share/
sudo rm -rf /usr/local/lib/android
sudo rm -rf /usr/share/dotnet
sudo du -sh /usr/local/lib/
sudo du -sh /usr/share/
-
name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3

View File

@ -13,7 +13,7 @@ concurrency:
jobs:
latest-with-torch-nightly-docker:
name: "Nightly PyTorch + Stable TensorFlow"
runs-on: [intel-cpu, 8-cpu, ci]
runs-on: ubuntu-22.04
steps:
- name: Cleanup disk
run: |
@ -50,7 +50,7 @@ jobs:
nightly-torch-deepspeed-docker:
name: "Nightly PyTorch + DeepSpeed"
runs-on: [intel-cpu, 8-cpu, ci]
runs-on: ubuntu-22.04
steps:
- name: Cleanup disk
run: |

View File

@ -16,7 +16,7 @@ jobs:
fail-fast: false
matrix:
version: ["1.13", "1.12", "1.11"]
runs-on: [intel-cpu, 8-cpu, ci]
runs-on: ubuntu-22.04
steps:
-
name: Set up Docker Buildx
@ -60,7 +60,7 @@ jobs:
fail-fast: false
matrix:
version: ["2.11", "2.10", "2.9", "2.8", "2.7", "2.6", "2.5"]
runs-on: [intel-cpu, 8-cpu, ci]
runs-on: ubuntu-22.04
steps:
-
name: Set up Docker Buildx

View File

@ -80,7 +80,7 @@ jobs:
- name: Run all tests on GPU
working-directory: /transformers
run: python3 -m pytest -rsfE -v --make-reports=${{ inputs.machine_type }}_run_models_gpu_${{ matrix.folders }}_test_reports tests/${{ matrix.folders }}
run: python3 -m pytest -rs -v --make-reports=${{ inputs.machine_type }}_run_models_gpu_${{ matrix.folders }}_test_reports tests/${{ matrix.folders }}
- name: Failure short reports
if: ${{ failure() }}

View File

@ -5,6 +5,7 @@ on:
branches: [ main ]
env:
IS_GITHUB_CI: "1"
OUTPUT_SLACK_CHANNEL_ID: "C06L2SGMEEA"
HF_HUB_READ_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }}
HF_HOME: /mnt/cache
@ -85,7 +86,7 @@ jobs:
- name: Run FA2 tests
id: run_fa2_tests
run:
pytest -rsfE -m "flash_attn_test" --make-reports=${{ matrix.model-name }}_fa2_tests/ tests/${{ matrix.model-name }}/test_modeling_*
pytest -rs -m "flash_attn_test" --make-reports=${{ matrix.model-name }}_fa2_tests/ tests/${{ matrix.model-name }}/test_modeling_*
- name: "Test suite reports artifacts: ${{ matrix.model-name }}_fa2_tests"
if: ${{ always() }}
@ -107,7 +108,7 @@ jobs:
id: run_integration_tests
if: always()
run:
pytest -rsfE -k "IntegrationTest" --make-reports=tests_integration_${{ matrix.model-name }} tests/${{ matrix.model-name }}/test_modeling_*
pytest -rs -k "IntegrationTest" --make-reports=tests_integration_${{ matrix.model-name }} tests/${{ matrix.model-name }}/test_modeling_*
- name: "Test suite reports artifacts: tests_integration_${{ matrix.model-name }}"
if: ${{ always() }}

View File

@ -110,7 +110,7 @@ jobs:
- name: Run all tests on GPU
working-directory: /transformers
run: python3 -m pytest -v -rsfE --make-reports=${{ matrix.machine_type }}_run_models_gpu_${{ matrix.folders }}_test_reports tests/${{ matrix.folders }}
run: python3 -m pytest -v -rs --make-reports=${{ matrix.machine_type }}_run_models_gpu_${{ matrix.folders }}_test_reports tests/${{ matrix.folders }}
- name: Failure short reports
if: ${{ failure() }}

View File

@ -1,11 +1,14 @@
name: SSH into our runners
on:
push:
branches:
- my-ssh
workflow_dispatch:
inputs:
runner_type:
description: 'Type of runner to test (a10 or t4)'
required: true
required: true
docker_image:
description: 'Name of the Docker image'
required: true
@ -14,23 +17,24 @@ on:
required: true
env:
IS_GITHUB_CI: "1"
HF_HUB_READ_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }}
HF_HOME: /mnt/cache
TRANSFORMERS_IS_CI: yes
OMP_NUM_THREADS: 8
MKL_NUM_THREADS: 8
RUN_SLOW: yes # For gated repositories, we still need to agree to share information on the Hub repo. page in order to get access. # This token is created under the bot `hf-transformers-bot`.
SIGOPT_API_TOKEN: ${{ secrets.SIGOPT_API_TOKEN }}
TF_FORCE_GPU_ALLOW_GROWTH: true
HF_HOME: /mnt/cache
TRANSFORMERS_IS_CI: yes
OMP_NUM_THREADS: 8
MKL_NUM_THREADS: 8
RUN_SLOW: yes # For gated repositories, we still need to agree to share information on the Hub repo. page in order to get access. # This token is created under the bot `hf-transformers-bot`.
SIGOPT_API_TOKEN: ${{ secrets.SIGOPT_API_TOKEN }}
TF_FORCE_GPU_ALLOW_GROWTH: true
CUDA_VISIBLE_DEVICES: 0,1
RUN_PT_TF_CROSS_TESTS: 1
jobs:
ssh_runner:
name: "SSH"
runs-on: ["${{ github.event.inputs.num_gpus }}-gpu", nvidia-gpu, "${{ github.event.inputs.runner_type }}", ci]
runs-on: ["multi-gpu", nvidia-gpu, t4, ci]
container:
image: ${{ github.event.inputs.docker_image }}
image: huggingface/transformers-all-latest-gpu
options: --gpus all --privileged --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
steps:
@ -38,26 +42,24 @@ jobs:
working-directory: /transformers
run: |
git fetch && git checkout ${{ github.sha }}
- name: Cleanup
working-directory: /transformers
run: |
rm -rf tests/__pycache__
rm -rf tests/models/__pycache__
rm -rf reports
- name: Show installed libraries and their versions
working-directory: /transformers
run: pip freeze
- name: NVIDIA-SMI
run: |
nvidia-smi
- name: Tailscale # In order to be able to SSH when a test fails
uses: huggingface/tailscale-action@main
with:
authkey: ${{ secrets.TAILSCALE_SSH_AUTHKEY }}
slackChannel: ${{ secrets.SLACK_CIFEEDBACK_CHANNEL }}
slackToken: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
waitForSSH: true
waitForSSH: true

View File

@ -51,10 +51,6 @@ RUN python3 -m pip install --no-cache-dir bitsandbytes
# Some tests require quanto
RUN python3 -m pip install --no-cache-dir quanto
# `quanto` will install `ninja` which leads to many `CUDA error: an illegal memory access ...` in some model tests
# (`deformable_detr`, `rwkv`, `mra`)
RUN python3 -m pip uninstall -y ninja
# For `dinat` model
# The `XXX` part in `torchXXX` needs to match `PYTORCH` (to some extent)
RUN python3 -m pip install --no-cache-dir natten==0.15.1+torch220$CUDA -f https://shi-labs.com/natten/wheels

View File

@ -162,7 +162,7 @@ Transformers verwendet die Shell-Umgebungsvariablen `PYTORCH_TRANSFORMERS_CACHE`
## Offline Modus
Transformers ist in der Lage, in einer Firewall- oder Offline-Umgebung zu laufen, indem es nur lokale Dateien verwendet. Setzen Sie die Umgebungsvariable `HF_HUB_OFFLINE=1`, um dieses Verhalten zu aktivieren.
Transformers ist in der Lage, in einer Firewall- oder Offline-Umgebung zu laufen, indem es nur lokale Dateien verwendet. Setzen Sie die Umgebungsvariable `TRANSFORMERS_OFFLINE=1`, um dieses Verhalten zu aktivieren.
<Tip>
@ -179,7 +179,7 @@ python examples/pytorch/translation/run_translation.py --model_name_or_path goog
Führen Sie das gleiche Programm in einer Offline-Instanz mit aus:
```bash
HF_DATASETS_OFFLINE=1 HF_HUB_OFFLINE=1 \
HF_DATASETS_OFFLINE=1 TRANSFORMERS_OFFLINE=1 \
python examples/pytorch/translation/run_translation.py --model_name_or_path google-t5/t5-small --dataset_name wmt16 --dataset_config ro-en ...
```

View File

@ -86,10 +86,10 @@ model.load_adapter(peft_model_id)
Die `bitsandbytes`-Integration unterstützt Datentypen mit 8bit und 4bit Genauigkeit, was für das Laden großer Modelle nützlich ist, weil es Speicher spart (lesen Sie den `bitsandbytes`-Integrations [guide](./quantization#bitsandbytes-integration), um mehr zu erfahren). Fügen Sie die Parameter `load_in_8bit` oder `load_in_4bit` zu [`~PreTrainedModel.from_pretrained`] hinzu und setzen Sie `device_map="auto"`, um das Modell effektiv auf Ihre Hardware zu verteilen:
```py
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from transformers import AutoModelForCausalLM, AutoTokenizer
peft_model_id = "ybelkada/opt-350m-lora"
model = AutoModelForCausalLM.from_pretrained(peft_model_id, quantization_config=BitsAndBytesConfig(load_in_8bit=True))
model = AutoModelForCausalLM.from_pretrained(peft_model_id, device_map="auto", load_in_8bit=True)
```
## Einen neuen Adapter hinzufügen

View File

@ -2,4 +2,3 @@
perf_infer_gpu_many: perf_infer_gpu_one
transformers_agents: agents
quantization: quantization/overview

View File

@ -28,8 +28,8 @@ An agent is a system that uses an LLM as its engine, and it has access to functi
These *tools* are functions for performing a task, and they contain all necessary description for the agent to properly use them.
The agent can be programmed to:
- devise a series of actions/tools and run them all at once like the [`CodeAgent`] for example
- plan and execute actions/tools one by one and wait for the outcome of each action before launching the next one like the [`ReactJsonAgent`] for example
- devise a series of actions/tools and run them all at once like the `CodeAgent` for example
- plan and execute actions/tools one by one and wait for the outcome of each action before launching the next one like the `ReactJsonAgent` for example
### Types of agents
@ -42,8 +42,8 @@ This agent has a planning step, then generates python code to execute all its ac
This is the go-to agent to solve reasoning tasks, since the ReAct framework ([Yao et al., 2022](https://huggingface.co/papers/2210.03629)) makes it really efficient to think on the basis of its previous observations.
We implement two versions of ReactJsonAgent:
- [`ReactJsonAgent`] generates tool calls as a JSON in its output.
- [`ReactCodeAgent`] is a new type of ReactJsonAgent that generates its tool calls as blobs of code, which works really well for LLMs that have strong coding performance.
- [`~ReactJsonAgent`] generates tool calls as a JSON in its output.
- [`~ReactCodeAgent`] is a new type of ReactJsonAgent that generates its tool calls as blobs of code, which works really well for LLMs that have strong coding performance.
> [!TIP]
> Read [Open-source LLMs as LangChain Agents](https://huggingface.co/blog/open-source-llms-as-agents) blog post to learn more the ReAct agent.
@ -124,7 +124,7 @@ You could use any `llm_engine` method as long as:
You also need a `tools` argument which accepts a list of `Tools`. You can provide an empty list for `tools`, but use the default toolbox with the optional argument `add_base_tools=True`.
Now you can create an agent, like [`CodeAgent`], and run it. For convenience, we also provide the [`HfEngine`] class that uses `huggingface_hub.InferenceClient` under the hood.
Now you can create an agent, like `CodeAgent`, and run it. For convenience, we also provide the `HfEngine` class that uses `huggingface_hub.InferenceClient` under the hood.
```python
from transformers import CodeAgent, HfEngine
@ -139,7 +139,7 @@ agent.run(
```
This will be handy in case of emergency baguette need!
You can even leave the argument `llm_engine` undefined, and an [`HfEngine`] will be created by default.
You can even leave the argument `llm_engine` undefined, and an [~HfEngine] will be created by default.
```python
from transformers import CodeAgent
@ -181,27 +181,13 @@ You can also run an agent consecutively for different tasks: each time the attri
A Python interpreter executes the code on a set of inputs passed along with your tools.
This should be safe because the only functions that can be called are the tools you provided (especially if it's only tools by Hugging Face) and the print function, so you're already limited in what can be executed.
The Python interpreter also doesn't allow imports by default outside of a safe list, so all the most obvious attacks shouldn't be an issue.
You can still authorize additional imports by passing the authorized modules as a list of strings in argument `additional_authorized_imports` upon initialization of your [`ReactCodeAgent`] or [`CodeAgent`]:
```py
>>> from transformers import ReactCodeAgent
>>> agent = ReactCodeAgent(tools=[], additional_authorized_imports=['requests', 'bs4'])
>>>agent.run("Could you get me the title of the page at url 'https://huggingface.co/blog'?")
(...)
'Hugging Face Blog'
```
The Python interpreter also doesn't allow any attribute lookup or imports (which shouldn't be needed for passing inputs/outputs to a small set of functions) so all the most obvious attacks shouldn't be an issue.
The execution will stop at any code trying to perform an illegal operation or if there is a regular Python error with the code generated by the agent.
> [!WARNING]
> The LLM can generate arbitrary code that will then be executed: do not add any unsafe imports!
### The system prompt
An agent, or rather the LLM that drives the agent, generates an output based on the system prompt. The system prompt can be customized and tailored to the intended task. For example, check the system prompt for the [`ReactCodeAgent`] (below version is slightly simplified).
An agent, or rather the LLM that drives the agent, generates an output based on the system prompt. The system prompt can be customized and tailored to the intended task. For example, check the system prompt for the `ReactCodeAgent` (below version is slightly simplified).
```text
You will be given a task to solve as best you can.
@ -260,7 +246,7 @@ of the available tools.
A tool is an atomic function to be used by an agent.
You can for instance check the [`PythonInterpreterTool`]: it has a name, a description, input descriptions, an output type, and a `__call__` method to perform the action.
You can for instance check the [~PythonInterpreterTool]: it has a name, a description, input descriptions, an output type, and a `__call__` method to perform the action.
When the agent is initialized, the tool attributes are used to generate a tool description which is baked into the agent's system prompt. This lets the agent know which tools it can use and why.
@ -273,7 +259,7 @@ Transformers comes with a default toolbox for empowering agents, that you can ad
- **Speech to text**: given an audio recording of a person talking, transcribe the speech into text ([Whisper](./model_doc/whisper))
- **Text to speech**: convert text to speech ([SpeechT5](./model_doc/speecht5))
- **Translation**: translates a given sentence from source language to target language.
- **Python code interpreter**: runs your the LLM generated Python code in a secure environment. This tool will only be added to [`ReactJsonAgent`] if you use `add_base_tools=True`, since code-based tools can already execute Python code
- **Python code interpreter**: runs your the LLM generated Python code in a secure environment. This tool will only be added to [~ReactJsonAgent] if you use `add_base_tools=True`, since code-based tools can already execute Python code
You can manually use a tool by calling the [`load_tool`] function and a task to perform.

View File

@ -169,7 +169,7 @@ Pretrained models are downloaded and locally cached at: `~/.cache/huggingface/hu
## Offline mode
Run 🤗 Transformers in a firewalled or offline environment with locally cached files by setting the environment variable `HF_HUB_OFFLINE=1`.
Run 🤗 Transformers in a firewalled or offline environment with locally cached files by setting the environment variable `TRANSFORMERS_OFFLINE=1`.
<Tip>
@ -178,7 +178,7 @@ Add [🤗 Datasets](https://huggingface.co/docs/datasets/) to your offline train
</Tip>
```bash
HF_DATASETS_OFFLINE=1 HF_HUB_OFFLINE=1 \
HF_DATASETS_OFFLINE=1 TRANSFORMERS_OFFLINE=1 \
python examples/pytorch/translation/run_translation.py --model_name_or_path google-t5/t5-small --dataset_name wmt16 --dataset_config ro-en ...
```

View File

@ -16,14 +16,6 @@ rendered properly in your Markdown viewer.
# DETA
<Tip warning={true}>
This model is in maintenance mode only, we don't accept any new PRs changing its code.
If you run into any issues running this model, please reinstall the last version that supported this model: v4.40.2.
You can do so by running the following command: `pip install -U transformers==4.40.2`.
</Tip>
## Overview
The DETA model was proposed in [NMS Strikes Back](https://arxiv.org/abs/2212.06137) by Jeffrey Ouyang-Zhang, Jang Hyun Cho, Xingyi Zhou, Philipp Krähenbühl.

View File

@ -16,36 +16,28 @@ rendered properly in your Markdown viewer.
# EfficientFormer
<Tip warning={true}>
This model is in maintenance mode only, we don't accept any new PRs changing its code.
If you run into any issues running this model, please reinstall the last version that supported this model: v4.40.2.
You can do so by running the following command: `pip install -U transformers==4.40.2`.
</Tip>
## Overview
The EfficientFormer model was proposed in [EfficientFormer: Vision Transformers at MobileNet Speed](https://arxiv.org/abs/2206.01191)
The EfficientFormer model was proposed in [EfficientFormer: Vision Transformers at MobileNet Speed](https://arxiv.org/abs/2206.01191)
by Yanyu Li, Geng Yuan, Yang Wen, Eric Hu, Georgios Evangelidis, Sergey Tulyakov, Yanzhi Wang, Jian Ren. EfficientFormer proposes a
dimension-consistent pure transformer that can be run on mobile devices for dense prediction tasks like image classification, object
detection and semantic segmentation.
The abstract from the paper is the following:
*Vision Transformers (ViT) have shown rapid progress in computer vision tasks, achieving promising results on various benchmarks.
However, due to the massive number of parameters and model design, e.g., attention mechanism, ViT-based models are generally
times slower than lightweight convolutional networks. Therefore, the deployment of ViT for real-time applications is particularly
challenging, especially on resource-constrained hardware such as mobile devices. Recent efforts try to reduce the computation
complexity of ViT through network architecture search or hybrid design with MobileNet block, yet the inference speed is still
unsatisfactory. This leads to an important question: can transformers run as fast as MobileNet while obtaining high performance?
To answer this, we first revisit the network architecture and operators used in ViT-based models and identify inefficient designs.
Then we introduce a dimension-consistent pure transformer (without MobileNet blocks) as a design paradigm.
Finally, we perform latency-driven slimming to get a series of final models dubbed EfficientFormer.
Extensive experiments show the superiority of EfficientFormer in performance and speed on mobile devices.
Our fastest model, EfficientFormer-L1, achieves 79.2% top-1 accuracy on ImageNet-1K with only 1.6 ms inference latency on
iPhone 12 (compiled with CoreML), which { runs as fast as MobileNetV2×1.4 (1.6 ms, 74.7% top-1),} and our largest model,
EfficientFormer-L7, obtains 83.3% accuracy with only 7.0 ms latency. Our work proves that properly designed transformers can
*Vision Transformers (ViT) have shown rapid progress in computer vision tasks, achieving promising results on various benchmarks.
However, due to the massive number of parameters and model design, e.g., attention mechanism, ViT-based models are generally
times slower than lightweight convolutional networks. Therefore, the deployment of ViT for real-time applications is particularly
challenging, especially on resource-constrained hardware such as mobile devices. Recent efforts try to reduce the computation
complexity of ViT through network architecture search or hybrid design with MobileNet block, yet the inference speed is still
unsatisfactory. This leads to an important question: can transformers run as fast as MobileNet while obtaining high performance?
To answer this, we first revisit the network architecture and operators used in ViT-based models and identify inefficient designs.
Then we introduce a dimension-consistent pure transformer (without MobileNet blocks) as a design paradigm.
Finally, we perform latency-driven slimming to get a series of final models dubbed EfficientFormer.
Extensive experiments show the superiority of EfficientFormer in performance and speed on mobile devices.
Our fastest model, EfficientFormer-L1, achieves 79.2% top-1 accuracy on ImageNet-1K with only 1.6 ms inference latency on
iPhone 12 (compiled with CoreML), which { runs as fast as MobileNetV2×1.4 (1.6 ms, 74.7% top-1),} and our largest model,
EfficientFormer-L7, obtains 83.3% accuracy with only 7.0 ms latency. Our work proves that properly designed transformers can
reach extremely low latency on mobile devices while maintaining high performance.*
This model was contributed by [novice03](https://huggingface.co/novice03) and [Bearnardd](https://huggingface.co/Bearnardd).
@ -101,4 +93,4 @@ The original code can be found [here](https://github.com/snap-research/Efficient
- call
</tf>
</frameworkcontent>
</frameworkcontent>

View File

@ -16,14 +16,6 @@ rendered properly in your Markdown viewer.
# ErnieM
<Tip warning={true}>
This model is in maintenance mode only, we don't accept any new PRs changing its code.
If you run into any issues running this model, please reinstall the last version that supported this model: v4.40.2.
You can do so by running the following command: `pip install -U transformers==4.40.2`.
</Tip>
## Overview
The ErnieM model was proposed in [ERNIE-M: Enhanced Multilingual Representation by Aligning

View File

@ -16,14 +16,6 @@ rendered properly in your Markdown viewer.
# GPTSAN-japanese
<Tip warning={true}>
This model is in maintenance mode only, we don't accept any new PRs changing its code.
If you run into any issues running this model, please reinstall the last version that supported this model: v4.40.2.
You can do so by running the following command: `pip install -U transformers==4.40.2`.
</Tip>
## Overview
The GPTSAN-japanese model was released in the repository by Toshiyuki Sakamoto (tanreinama).

View File

@ -1,7 +1,7 @@
<!--Copyright 2022 The HuggingFace Team and Microsoft. All rights reserved.
Licensed under the MIT License; you may not use this file except in compliance with
the License.
the License.
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
@ -14,17 +14,9 @@ rendered properly in your Markdown viewer.
# Graphormer
<Tip warning={true}>
This model is in maintenance mode only, we don't accept any new PRs changing its code.
If you run into any issues running this model, please reinstall the last version that supported this model: v4.40.2.
You can do so by running the following command: `pip install -U transformers==4.40.2`.
</Tip>
## Overview
The Graphormer model was proposed in [Do Transformers Really Perform Bad for Graph Representation?](https://arxiv.org/abs/2106.05234) by
The Graphormer model was proposed in [Do Transformers Really Perform Bad for Graph Representation?](https://arxiv.org/abs/2106.05234) by
Chengxuan Ying, Tianle Cai, Shengjie Luo, Shuxin Zheng, Guolin Ke, Di He, Yanming Shen and Tie-Yan Liu. It is a Graph Transformer model, modified to allow computations on graphs instead of text sequences by generating embeddings and features of interest during preprocessing and collation, then using a modified attention.
The abstract from the paper is the following:

View File

@ -15,14 +15,6 @@ rendered properly in your Markdown viewer.
-->
# Jukebox
<Tip warning={true}>
This model is in maintenance mode only, we don't accept any new PRs changing its code.
If you run into any issues running this model, please reinstall the last version that supported this model: v4.40.2.
You can do so by running the following command: `pip install -U transformers==4.40.2`.
</Tip>
## Overview
The Jukebox model was proposed in [Jukebox: A generative model for music](https://arxiv.org/pdf/2005.00341.pdf)
@ -35,7 +27,7 @@ The abstract from the paper is the following:
*We introduce Jukebox, a model that generates music with singing in the raw audio domain. We tackle the long context of raw audio using a multiscale VQ-VAE to compress it to discrete codes, and modeling those using autoregressive Transformers. We show that the combined model at scale can generate high-fidelity and diverse songs with coherence up to multiple minutes. We can condition on artist and genre to steer the musical and vocal style, and on unaligned lyrics to make the singing more controllable. We are releasing thousands of non cherry-picked samples, along with model weights and code.*
As shown on the following figure, Jukebox is made of 3 `priors` which are decoder only models. They follow the architecture described in [Generating Long Sequences with Sparse Transformers](https://arxiv.org/abs/1904.10509), modified to support longer context length.
First, a autoencoder is used to encode the text lyrics. Next, the first (also called `top_prior`) prior attends to the last hidden states extracted from the lyrics encoder. The priors are linked to the previous priors respectively via an `AudioConditioner` module. The`AudioConditioner` upsamples the outputs of the previous prior to raw tokens at a certain audio frame per second resolution.
First, a autoencoder is used to encode the text lyrics. Next, the first (also called `top_prior`) prior attends to the last hidden states extracted from the lyrics encoder. The priors are linked to the previous priors respectively via an `AudioConditioner` module. The`AudioConditioner` upsamples the outputs of the previous prior to raw tokens at a certain audio frame per second resolution.
The metadata such as *artist, genre and timing* are passed to each prior, in the form of a start token and positional embedding for the timing data. The hidden states are mapped to the closest codebook vector from the VQVAE in order to convert them to raw audio.
![JukeboxModel](https://gist.githubusercontent.com/ArthurZucker/92c1acaae62ebf1b6a951710bdd8b6af/raw/c9c517bf4eff61393f6c7dec9366ef02bdd059a3/jukebox.svg)

View File

@ -41,7 +41,6 @@ This model was contributed by [Shivalika Singh](https://huggingface.co/shivi) an
A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with Mask2Former.
- Demo notebooks regarding inference + fine-tuning Mask2Former on custom data can be found [here](https://github.com/NielsRogge/Transformers-Tutorials/tree/master/Mask2Former).
- Scripts for finetuning [`Mask2Former`] with [`Trainer`] or [Accelerate](https://huggingface.co/docs/accelerate/index) can be found [here](https://github.com/huggingface/transformers/tree/main/examples/pytorch/instance-segmentation).
If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we will review it.
The resource should ideally demonstrate something new instead of duplicating an existing resource.

View File

@ -51,7 +51,6 @@ This model was contributed by [francesco](https://huggingface.co/francesco). The
<PipelineTag pipeline="image-segmentation"/>
- All notebooks that illustrate inference as well as fine-tuning on custom data with MaskFormer can be found [here](https://github.com/NielsRogge/Transformers-Tutorials/tree/master/MaskFormer).
- Scripts for finetuning [`MaskFormer`] with [`Trainer`] or [Accelerate](https://huggingface.co/docs/accelerate/index) can be found [here](https://github.com/huggingface/transformers/tree/main/examples/pytorch/instance-segmentation).
## MaskFormer specific outputs

View File

@ -16,20 +16,12 @@ rendered properly in your Markdown viewer.
# MEGA
<Tip warning={true}>
This model is in maintenance mode only, we don't accept any new PRs changing its code.
If you run into any issues running this model, please reinstall the last version that supported this model: v4.40.2.
You can do so by running the following command: `pip install -U transformers==4.40.2`.
</Tip>
## Overview
The MEGA model was proposed in [Mega: Moving Average Equipped Gated Attention](https://arxiv.org/abs/2209.10655) by Xuezhe Ma, Chunting Zhou, Xiang Kong, Junxian He, Liangke Gui, Graham Neubig, Jonathan May, and Luke Zettlemoyer.
MEGA proposes a new approach to self-attention with each encoder layer having a multi-headed exponential moving average in addition to a single head of standard dot-product attention, giving the attention mechanism
stronger positional biases. This allows MEGA to perform competitively to Transformers on standard benchmarks including LRA
while also having significantly fewer parameters. MEGA's compute efficiency allows it to scale to very long sequences, making it an
MEGA proposes a new approach to self-attention with each encoder layer having a multi-headed exponential moving average in addition to a single head of standard dot-product attention, giving the attention mechanism
stronger positional biases. This allows MEGA to perform competitively to Transformers on standard benchmarks including LRA
while also having significantly fewer parameters. MEGA's compute efficiency allows it to scale to very long sequences, making it an
attractive option for long-document NLP tasks.
The abstract from the paper is the following:
@ -42,8 +34,8 @@ The original code can be found [here](https://github.com/facebookresearch/mega).
## Usage tips
- MEGA can perform quite well with relatively few parameters. See Appendix D in the MEGA paper for examples of architectural specs which perform well in various settings. If using MEGA as a decoder, be sure to set `bidirectional=False` to avoid errors with default bidirectional.
- Mega-chunk is a variant of mega that reduces time and spaces complexity from quadratic to linear. Utilize chunking with MegaConfig.use_chunking and control chunk size with MegaConfig.chunk_size
- MEGA can perform quite well with relatively few parameters. See Appendix D in the MEGA paper for examples of architectural specs which perform well in various settings. If using MEGA as a decoder, be sure to set `bidirectional=False` to avoid errors with default bidirectional.
- Mega-chunk is a variant of mega that reduces time and spaces complexity from quadratic to linear. Utilize chunking with MegaConfig.use_chunking and control chunk size with MegaConfig.chunk_size
## Implementation Notes

View File

@ -16,14 +16,6 @@ rendered properly in your Markdown viewer.
# Neighborhood Attention Transformer
<Tip warning={true}>
This model is in maintenance mode only, we don't accept any new PRs changing its code.
If you run into any issues running this model, please reinstall the last version that supported this model: v4.40.2.
You can do so by running the following command: `pip install -U transformers==4.40.2`.
</Tip>
## Overview
NAT was proposed in [Neighborhood Attention Transformer](https://arxiv.org/abs/2204.07143)

View File

@ -16,14 +16,6 @@ rendered properly in your Markdown viewer.
# Nezha
<Tip warning={true}>
This model is in maintenance mode only, we don't accept any new PRs changing its code.
If you run into any issues running this model, please reinstall the last version that supported this model: v4.40.2.
You can do so by running the following command: `pip install -U transformers==4.40.2`.
</Tip>
## Overview
The Nezha model was proposed in [NEZHA: Neural Contextualized Representation for Chinese Language Understanding](https://arxiv.org/abs/1909.00204) by Junqiu Wei et al.
@ -33,8 +25,8 @@ The abstract from the paper is the following:
*The pre-trained language models have achieved great successes in various natural language understanding (NLU) tasks
due to its capacity to capture the deep contextualized information in text by pre-training on large-scale corpora.
In this technical report, we present our practice of pre-training language models named NEZHA (NEural contextualiZed
representation for CHinese lAnguage understanding) on Chinese corpora and finetuning for the Chinese NLU tasks.
The current version of NEZHA is based on BERT with a collection of proven improvements, which include Functional
representation for CHinese lAnguage understanding) on Chinese corpora and finetuning for the Chinese NLU tasks.
The current version of NEZHA is based on BERT with a collection of proven improvements, which include Functional
Relative Positional Encoding as an effective positional encoding scheme, Whole Word Masking strategy,
Mixed Precision Training and the LAMB Optimizer in training the models. The experimental results show that NEZHA
achieves the state-of-the-art performances when finetuned on several representative Chinese tasks, including
@ -93,4 +85,4 @@ This model was contributed by [sijunhe](https://huggingface.co/sijunhe). The ori
## NezhaForQuestionAnswering
[[autodoc]] NezhaForQuestionAnswering
- forward
- forward

View File

@ -18,51 +18,11 @@ rendered properly in your Markdown viewer.
## Overview
The PaliGemma model was proposed in [PaliGemma Google's Cutting-Edge Open Vision Language Model](https://huggingface.co/blog/paligemma) by Google. It is a 3B vision-language model composed by a [SigLIP](siglip) vision encoder and a [Gemma](gemma) language decoder linked by a multimodal linear projection. It cuts an image into a fixed number of VIT tokens and prepends it to an optional prompt. One particularity is that the model uses full block attention on all the image tokens plus the input text tokens. It comes in 3 resolutions, 224x224, 448x448 and 896x896 with 3 base models, with 55 fine-tuned versions for different tasks, and 2 mix models.
The PaliGemma model was proposed by Google. It is a 3B VLM composed by a Siglip-400m vision encoder and a Gemma-2B decoder linked by a multimodal linear projection. It is not a chat model with images. It cuts an image into a fixed number of VIT tokens and prepends it to an optional prompt. One particularity is that the model uses full block attention on all the image tokens plus the input text tokens. It comes in 3 resolutions, 224x224, 448x448 and 896x896 with 3 base models, with 55 fine-tuned versions for different tasks, and 2 mix models.
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/paligemma/paligemma_arch.png"
alt="drawing" width="600"/>
<small> PaliGemma architecture. Taken from the <a href="https://huggingface.co/blog/paligemma">blog post.</a> </small>
This model was contributed by [Molbap](https://huggingface.co/Molbap).
## Usage tips
Inference with PaliGemma can be performed as follows:
```python
from transformers import AutoProcessor, PaliGemmaForConditionalGeneration
model_id = "google/paligemma-3b-mix-224"
model = PaliGemmaForConditionalGeneration.from_pretrained(model_id)
processor = AutoProcessor.from_pretrained(model_id)
prompt = "What is on the flower?"
image_file = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/bee.jpg?download=true"
raw_image = Image.open(requests.get(image_file, stream=True).raw)
inputs = processor(prompt, raw_image, return_tensors="pt")
output = model.generate(**inputs, max_new_tokens=20)
print(processor.decode(output[0], skip_special_tokens=True)[len(prompt):])
```
- PaliGemma is not meant for conversational use, and it works best when fine-tuning to a specific use case. Some downstream tasks on which PaliGemma can be fine-tuned include image captioning, visual question answering (VQA), object detection, referring expression segmentation and document understanding.
- One can use `PaliGemmaProcessor` to prepare images, text and optional labels for the model. When fine-tuning a PaliGemma model, the `suffix` argument can be passed to the processor which creates the `labels` for the model:
```python
prompt = "What is on the flower?"
answer = "a bee"
inputs = processor(text=prompt, images=raw_image, suffix=answer, return_tensors="pt")
```
## Resources
A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with PaliGemma. If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource.
- A blog post introducing all the features of PaliGemma can be found [here](https://huggingface.co/blog/paligemma).
- Demo notebooks on how to fine-tune PaliGemma for VQA with the Trainer API along with inference can be found [here](https://github.com/huggingface/notebooks/tree/main/examples/paligemma).
- Demo notebooks on how to fine-tune PaliGemma on a custom dataset (receipt image -> JSON) along with inference can be found [here](https://github.com/NielsRogge/Transformers-Tutorials/tree/master/PaliGemma). 🌎
## PaliGemmaConfig

View File

@ -16,14 +16,6 @@ rendered properly in your Markdown viewer.
# QDQBERT
<Tip warning={true}>
This model is in maintenance mode only, we don't accept any new PRs changing its code.
If you run into any issues running this model, please reinstall the last version that supported this model: v4.40.2.
You can do so by running the following command: `pip install -U transformers==4.40.2`.
</Tip>
## Overview
The QDQBERT model can be referenced in [Integer Quantization for Deep Learning Inference: Principles and Empirical

View File

@ -16,14 +16,6 @@ rendered properly in your Markdown viewer.
# REALM
<Tip warning={true}>
This model is in maintenance mode only, we don't accept any new PRs changing its code.
If you run into any issues running this model, please reinstall the last version that supported this model: v4.40.2.
You can do so by running the following command: `pip install -U transformers==4.40.2`.
</Tip>
## Overview
The REALM model was proposed in [REALM: Retrieval-Augmented Language Model Pre-Training](https://arxiv.org/abs/2002.08909) by Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat and Ming-Wei Chang. It's a
@ -94,4 +86,4 @@ This model was contributed by [qqaatw](https://huggingface.co/qqaatw). The origi
[[autodoc]] RealmForOpenQA
- block_embedding_to
- forward
- forward

View File

@ -81,10 +81,10 @@ processor = SamProcessor.from_pretrained("facebook/sam-vit-huge")
img_url = "https://huggingface.co/ybelkada/segment-anything/resolve/main/assets/car.png"
raw_image = Image.open(requests.get(img_url, stream=True).raw).convert("RGB")
mask_url = "https://huggingface.co/ybelkada/segment-anything/resolve/main/assets/car.png"
segmentation_map = Image.open(requests.get(mask_url, stream=True).raw).convert("1")
segmentation_map = Image.open(requests.get(mask_url, stream=True).raw).convert("RGB")
input_points = [[[450, 600]]] # 2D location of a window in the image
inputs = processor(raw_image, input_points=input_points, segmentation_maps=segmentation_map, return_tensors="pt").to(device)
inputs = processor(raw_image, input_points=input_points, segmentation_maps=mask, return_tensors="pt").to(device)
with torch.no_grad():
outputs = model(**inputs)

View File

@ -16,14 +16,6 @@ rendered properly in your Markdown viewer.
# Speech2Text2
<Tip warning={true}>
This model is in maintenance mode only, we don't accept any new PRs changing its code.
If you run into any issues running this model, please reinstall the last version that supported this model: v4.40.2.
You can do so by running the following command: `pip install -U transformers==4.40.2`.
</Tip>
## Overview
The Speech2Text2 model is used together with [Wav2Vec2](wav2vec2) for Speech Translation models proposed in

View File

@ -38,17 +38,12 @@ to repeatedly detect a much richer set of interest points than the initial pre-a
traditional corner detector. The final system gives rise to state-of-the-art homography estimation results on HPatches
when compared to LIFT, SIFT and ORB.*
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/superpoint_architecture.png"
alt="drawing" width="500"/>
<small> SuperPoint overview. Taken from the <a href="https://arxiv.org/abs/1712.07629v4">original paper.</a> </small>
## Usage tips
## How to use
Here is a quick example of using the model to detect interest points in an image:
```python
from transformers import AutoImageProcessor, SuperPointForKeypointDetection
from transformers import AutoImageProcessor, AutoModel
import torch
from PIL import Image
import requests
@ -57,7 +52,7 @@ url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)
processor = AutoImageProcessor.from_pretrained("magic-leap-community/superpoint")
model = SuperPointForKeypointDetection.from_pretrained("magic-leap-community/superpoint")
model = AutoModel.from_pretrained("magic-leap-community/superpoint")
inputs = processor(image, return_tensors="pt")
outputs = model(**inputs)
@ -69,7 +64,7 @@ You can also feed multiple images to the model. Due to the nature of SuperPoint,
you will need to use the mask attribute to retrieve the respective information :
```python
from transformers import AutoImageProcessor, SuperPointForKeypointDetection
from transformers import AutoImageProcessor, AutoModel
import torch
from PIL import Image
import requests
@ -82,7 +77,7 @@ image_2 = Image.open(requests.get(url_image_2, stream=True).raw)
images = [image_1, image_2]
processor = AutoImageProcessor.from_pretrained("magic-leap-community/superpoint")
model = SuperPointForKeypointDetection.from_pretrained("magic-leap-community/superpoint")
model = AutoModel.from_pretrained("magic-leap-community/superpoint")
inputs = processor(images, return_tensors="pt")
outputs = model(**inputs)
@ -108,12 +103,6 @@ cv2.imwrite("output_image.png", image)
This model was contributed by [stevenbucaille](https://huggingface.co/stevenbucaille).
The original code can be found [here](https://github.com/magicleap/SuperPointPretrainedNetwork).
## Resources
A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with SuperPoint. If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource.
- A notebook showcasing inference and visualization with SuperPoint can be found [here](https://github.com/NielsRogge/Transformers-Tutorials/blob/master/SuperPoint/Inference_with_SuperPoint_to_detect_interest_points_in_an_image.ipynb). 🌎
## SuperPointConfig
[[autodoc]] SuperPointConfig

View File

@ -16,14 +16,6 @@ rendered properly in your Markdown viewer.
# TVLT
<Tip warning={true}>
This model is in maintenance mode only, we don't accept any new PRs changing its code.
If you run into any issues running this model, please reinstall the last version that supported this model: v4.40.2.
You can do so by running the following command: `pip install -U transformers==4.40.2`.
</Tip>
## Overview
The TVLT model was proposed in [TVLT: Textless Vision-Language Transformer](https://arxiv.org/abs/2209.14156)
@ -68,7 +60,7 @@ The original code can be found [here](https://github.com/zinengtang/TVLT). This
[[autodoc]] TvltFeatureExtractor
- __call__
## TvltModel
[[autodoc]] TvltModel

View File

@ -16,14 +16,6 @@ rendered properly in your Markdown viewer.
# Hybrid Vision Transformer (ViT Hybrid)
<Tip warning={true}>
This model is in maintenance mode only, we don't accept any new PRs changing its code.
If you run into any issues running this model, please reinstall the last version that supported this model: v4.40.2.
You can do so by running the following command: `pip install -U transformers==4.40.2`.
</Tip>
## Overview
The hybrid Vision Transformer (ViT) model was proposed in [An Image is Worth 16x16 Words: Transformers for Image Recognition

View File

@ -30,7 +30,7 @@ Tips:
- Usage of X-CLIP is identical to [CLIP](clip).
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/xclip_architecture.png"
alt="drawing" width="600"/>
alt="drawing" width="600"/>
<small> X-CLIP architecture. Taken from the <a href="https://arxiv.org/abs/2208.02816">original paper.</a> </small>

View File

@ -16,14 +16,6 @@ rendered properly in your Markdown viewer.
# XLM-ProphetNet
<Tip warning={true}>
This model is in maintenance mode only, we don't accept any new PRs changing its code.
If you run into any issues running this model, please reinstall the last version that supported this model: v4.40.2.
You can do so by running the following command: `pip install -U transformers==4.40.2`.
</Tip>
<div class="flex flex-wrap space-x-1">
<a href="https://huggingface.co/models?filter=xprophetnet">
<img alt="Models" src="https://img.shields.io/badge/All_model_pages-xprophetnet-blueviolet">

View File

@ -81,17 +81,15 @@ model = AutoModelForCausalLM.from_pretrained(model_id)
model.load_adapter(peft_model_id)
```
Check out the [API documentation](#transformers.integrations.PeftAdapterMixin) section below for more details.
## Load in 8bit or 4bit
The `bitsandbytes` integration supports 8bit and 4bit precision data types, which are useful for loading large models because it saves memory (see the `bitsandbytes` integration [guide](./quantization#bitsandbytes-integration) to learn more). Add the `load_in_8bit` or `load_in_4bit` parameters to [`~PreTrainedModel.from_pretrained`] and set `device_map="auto"` to effectively distribute the model to your hardware:
```py
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from transformers import AutoModelForCausalLM, AutoTokenizer
peft_model_id = "ybelkada/opt-350m-lora"
model = AutoModelForCausalLM.from_pretrained(peft_model_id, quantization_config=BitsAndBytesConfig(load_in_8bit=True))
model = AutoModelForCausalLM.from_pretrained(peft_model_id, device_map="auto", load_in_8bit=True)
```
## Add a new adapter
@ -229,19 +227,6 @@ lora_config = LoraConfig(
model.add_adapter(lora_config)
```
## API docs
[[autodoc]] integrations.PeftAdapterMixin
- load_adapter
- add_adapter
- set_adapter
- disable_adapters
- enable_adapters
- active_adapters
- get_adapter_state_dict
<!--
TODO: (@younesbelkada @stevhliu)

View File

@ -354,20 +354,20 @@ If you're curious and interested in learning more about the concepts underlying
To load a model in 8-bit for inference, use the `load_in_8bit` parameter. The `device_map` parameter is optional, but we recommend setting it to `"auto"` to allow 🤗 Accelerate to automatically and efficiently allocate the model given the available resources in the environment:
```py
from transformers import AutoModelForCausalLM, BitsAndBytesConfig
from transformers import AutoModelForCausalLM
model_name = "bigscience/bloom-2b5"
model_8bit = AutoModelForCausalLM.from_pretrained(model_name, quantization_config=BitsAndBytesConfig(load_in_8bit=True))
model_8bit = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", load_in_8bit=True)
```
If you're loading a model in 8-bit for text generation, you should use the [`~transformers.GenerationMixin.generate`] method instead of the [`Pipeline`] function which is not optimized for 8-bit models and will be slower. Some sampling strategies, like nucleus sampling, are also not supported by the [`Pipeline`] for 8-bit models. You should also place all inputs on the same device as the model:
```py
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "bigscience/bloom-2b5"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model_8bit = AutoModelForCausalLM.from_pretrained(model_name, quantization_config=BitsAndBytesConfig(load_in_8bit=True))
model_8bit = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", load_in_8bit=True)
prompt = "Hello, my llama is cute"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

View File

@ -52,7 +52,7 @@ Use the table below to help you decide which quantization method to use.
| [bitsandbytes](./bitsandbytes) | 🟢 | 🔴 | 🟢 | 🔴 | 🔴 | 🔴 | 4 / 8 | 🟢 | 🟢 | 🟢 | https://github.com/TimDettmers/bitsandbytes |
| [EETQ](./eetq) | 🟢 | 🔴 | 🟢 | 🔴 | 🔴 | ? | 8 | 🟢 | 🟢 | 🟢 | https://github.com/NetEase-FuXi/EETQ |
| GGUF / GGML (llama.cpp) | 🟢 | 🟢 | 🟢 | 🔴 | 🟢 | 🔴 | 1 - 8 | 🔴 | [See GGUF section](../gguf) | [See GGUF section](../gguf) | https://github.com/ggerganov/llama.cpp |
| [GPTQ](./gptq) | 🔴 | 🔴 | 🟢 | 🟢 | 🔴 | 🔴 | 2 - 3 - 4 - 8 | 🟢 | 🟢 | 🟢 | https://github.com/AutoGPTQ/AutoGPTQ |
| [GPTQ](./gptq) | 🔴 | 🔴 | 🟢 | 🟢 | 🔴 | 🔴 | 4 / 8 | 🟢 | 🟢 | 🟢 | https://github.com/AutoGPTQ/AutoGPTQ |
| [HQQ](./hqq) | 🟢 | 🟢 | 🟢 | 🔴 | 🔴 | 🟢 | 1 - 8 | 🟢 | 🔴 | 🟢 | https://github.com/mobiusml/hqq/ |
| [Quanto](./quanto) | 🟢 | 🟢 | 🟢 | 🔴 | 🟢 | 🟢 | 2 / 4 / 8 | 🔴 | 🔴 | 🟢 | https://github.com/huggingface/quanto |

View File

@ -204,7 +204,7 @@ Pass your text to the tokenizer:
The tokenizer returns a dictionary containing:
* [input_ids](./glossary#input-ids): numerical representations of your tokens.
* [attention_mask](./glossary#attention-mask): indicates which tokens should be attended to.
* [attention_mask](.glossary#attention-mask): indicates which tokens should be attended to.
A tokenizer can also accept a list of inputs, and pad and truncate the text to return a batch with uniform length:

View File

@ -154,7 +154,7 @@ Los modelos preentrenados se descargan y almacenan en caché localmente en: `~/.
## Modo Offline
🤗 Transformers puede ejecutarse en un entorno con firewall o fuera de línea (offline) usando solo archivos locales. Configura la variable de entorno `HF_HUB_OFFLINE=1` para habilitar este comportamiento.
🤗 Transformers puede ejecutarse en un entorno con firewall o fuera de línea (offline) usando solo archivos locales. Configura la variable de entorno `TRANSFORMERS_OFFLINE=1` para habilitar este comportamiento.
<Tip>
@ -171,7 +171,7 @@ python examples/pytorch/translation/run_translation.py --model_name_or_path goog
Ejecuta este mismo programa en una instancia offline con el siguiente comando:
```bash
HF_DATASETS_OFFLINE=1 HF_HUB_OFFLINE=1 \
HF_DATASETS_OFFLINE=1 TRANSFORMERS_OFFLINE=1 \
python examples/pytorch/translation/run_translation.py --model_name_or_path google-t5/t5-small --dataset_name wmt16 --dataset_config ro-en ...
```

View File

@ -171,7 +171,7 @@ Les modèles pré-entraînés sont téléchargés et mis en cache localement dan
## Mode hors ligne
🤗 Transformers peut fonctionner dans un environnement cloisonné ou hors ligne en n'utilisant que des fichiers locaux. Définissez la variable d'environnement `HF_HUB_OFFLINE=1` pour activer ce mode.
🤗 Transformers peut fonctionner dans un environnement cloisonné ou hors ligne en n'utilisant que des fichiers locaux. Définissez la variable d'environnement `TRANSFORMERS_OFFLINE=1` pour activer ce mode.
<Tip>
@ -180,7 +180,7 @@ Ajoutez [🤗 Datasets](https://huggingface.co/docs/datasets/) à votre processu
</Tip>
```bash
HF_DATASETS_OFFLINE=1 HF_HUB_OFFLINE=1 \
HF_DATASETS_OFFLINE=1 TRANSFORMERS_OFFLINE=1 \
python examples/pytorch/translation/run_translation.py --model_name_or_path google-t5/t5-small --dataset_name wmt16 --dataset_config ro-en ...
```

View File

@ -152,7 +152,7 @@ I modelli pre-allenati sono scaricati e memorizzati localmente nella cache in: `
## Modalità Offline
🤗 Transformers può essere eseguita in un ambiente firewalled o offline utilizzando solo file locali. Imposta la variabile d'ambiente `HF_HUB_OFFLINE=1` per abilitare questo comportamento.
🤗 Transformers può essere eseguita in un ambiente firewalled o offline utilizzando solo file locali. Imposta la variabile d'ambiente `TRANSFORMERS_OFFLINE=1` per abilitare questo comportamento.
<Tip>
@ -169,7 +169,7 @@ python examples/pytorch/translation/run_translation.py --model_name_or_path goog
Esegui lo stesso programma in un'istanza offline con:
```bash
HF_DATASETS_OFFLINE=1 HF_HUB_OFFLINE=1 \
HF_DATASETS_OFFLINE=1 TRANSFORMERS_OFFLINE=1 \
python examples/pytorch/translation/run_translation.py --model_name_or_path google-t5/t5-small --dataset_name wmt16 --dataset_config ro-en ...
```

View File

@ -55,10 +55,10 @@ Di seguito sono riportate alcune note per aiutarvi a utilizzare questo modulo, o
Dopo aver installato le librerie necessarie, per caricare il tuo modello mixed 8-bit è il seguente:
```py
from transformers import AutoModelForCausalLM, BitsAndBytesConfig
from transformers import AutoModelForCausalLM
model_name = "bigscience/bloom-2b5"
model_8bit = AutoModelForCausalLM.from_pretrained(model_name, quantization_config=BitsAndBytesConfig(load_in_8bit=True))
model_8bit = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", load_in_8bit=True)
```
Per la generazione di testo, si consiglia di:
@ -69,11 +69,11 @@ Per la generazione di testo, si consiglia di:
Ecco un semplice esempio:
```py
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "bigscience/bloom-2b5"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model_8bit = AutoModelForCausalLM.from_pretrained(model_name, quantization_config=BitsAndBytesConfig(load_in_8bit=True))
model_8bit = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", load_in_8bit=True)
text = "Hello, my llama is cute"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
@ -87,7 +87,7 @@ outputs = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)
Usare il seguente modo caricare il modello mixed-8bit su più GPU (stesso comando della configurazione a GPU singola):
```py
model_name = "bigscience/bloom-2b5"
model_8bit = AutoModelForCausalLM.from_pretrained(model_name, quantization_config=BitsAndBytesConfig(load_in_8bit=True))
model_8bit = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", load_in_8bit=True)
```
Puoi controllare la RAM della GPU che si vuole allocare su ogni GPU usando `accelerate`. Utilizzare l'argomento `max_memory` come segue:

View File

@ -157,7 +157,7 @@ conda install conda-forge::transformers
## オフラインモード
🤗 Transformersはローカルファイルのみを使用することでファイアウォールやオフラインの環境でも動作させることができます。この動作を有効にするためには、環境変数`HF_HUB_OFFLINE=1`を設定します。
🤗 Transformersはローカルファイルのみを使用することでファイアウォールやオフラインの環境でも動作させることができます。この動作を有効にするためには、環境変数`TRANSFORMERS_OFFLINE=1`を設定します。
<Tip>
@ -174,7 +174,7 @@ python examples/pytorch/translation/run_translation.py --model_name_or_path goog
オフラインインスタンスでこの同じプログラムを実行します:
```bash
HF_DATASETS_OFFLINE=1 HF_HUB_OFFLINE=1 \
HF_DATASETS_OFFLINE=1 TRANSFORMERS_OFFLINE=1 \
python examples/pytorch/translation/run_translation.py --model_name_or_path google-t5/t5-small --dataset_name wmt16 --dataset_config ro-en ...
```

View File

@ -245,12 +245,12 @@ model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto", load_i
```python
# pip install transformers accelerate bitsandbytes
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "bigscience/bloom-1b7"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=BitsAndBytesConfig(load_in_8bit=True))
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto", load_in_8bit=True)
```
次に、通常 [`PreTrainedModel`] を使用するのと同じようにモデルを使用します。
@ -321,9 +321,9 @@ model_double_quant = AutoModelForCausalLM.from_pretrained(model_id, quantization
この機能を使用できるようにするには、必ず `bitsandbytes>0.37.2` を使用してください (この記事の執筆時点では、`bitsandbytes==0.38.0.post1` でテストしました)。
```python
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("bigscience/bloom-560m", quantization_config=BitsAndBytesConfig(load_in_8bit=True))
model = AutoModelForCausalLM.from_pretrained("bigscience/bloom-560m", device_map="auto", load_in_8bit=True)
tokenizer = AutoTokenizer.from_pretrained("bigscience/bloom-560m")
model.push_to_hub("bloom-560m-8bit")

View File

@ -91,10 +91,10 @@ model.load_adapter(peft_model_id)
`bitsandbytes` 統合は、8ビットおよび4ビットの精度データ型をサポートしており、大規模なモデルを読み込む際にメモリを節約するのに役立ちます詳細については `bitsandbytes` 統合の[ガイド](./quantization#bitsandbytes-integration)を参照してください)。[`~PreTrainedModel.from_pretrained`] に `load_in_8bit` または `load_in_4bit` パラメータを追加し、`device_map="auto"` を設定してモデルを効果的にハードウェアに分散配置できます:
```py
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from transformers import AutoModelForCausalLM, AutoTokenizer
peft_model_id = "ybelkada/opt-350m-lora"
model = AutoModelForCausalLM.from_pretrained(peft_model_id, quantization_config=BitsAndBytesConfig(load_in_8bit=True))
model = AutoModelForCausalLM.from_pretrained(peft_model_id, device_map="auto", load_in_8bit=True)
```
## Add a new adapter

View File

@ -357,10 +357,10 @@ Int8混合精度行列分解は、行列乗算を2つのストリームに分割
必要なライブラリをインストールした後、ミックス 8 ビットモデルを読み込む方法は次の通りです:
```py
from transformers import AutoModelForCausalLM, BitsAndBytesConfig
from transformers import AutoModelForCausalLM
model_name = "bigscience/bloom-2b5"
model_8bit = AutoModelForCausalLM.from_pretrained(model_name, quantization_config=BitsAndBytesConfig(load_in_8bit=True))
model_8bit = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", load_in_8bit=True)
```
以下はシンプルな例です:
@ -370,11 +370,11 @@ model_8bit = AutoModelForCausalLM.from_pretrained(model_name, quantization_confi
```py
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "bigscience/bloom-2b5"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model_8bit = AutoModelForCausalLM.from_pretrained(model_name, quantization_config=BitsAndBytesConfig(load_in_8bit=True))
model_8bit = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", load_in_8bit=True)
prompt = "Hello, my llama is cute"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
@ -388,7 +388,7 @@ outputs = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)
```py
model_name = "bigscience/bloom-2b5"
model_8bit = AutoModelForCausalLM.from_pretrained(model_name, quantization_config=BitsAndBytesConfig(load_in_8bit=True))
model_8bit = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", load_in_8bit=True)
```
`accelerate`を使用して各GPUに割り当てるGPU RAMを制御する際には、以下のように`max_memory`引数を使用します:

View File

@ -157,7 +157,7 @@ conda install conda-forge::transformers
## 오프라인 모드[[offline-mode]]
🤗 Transformers를 로컬 파일만 사용하도록 해서 방화벽 또는 오프라인 환경에서 실행할 수 있습니다. 활성화하려면 `HF_HUB_OFFLINE=1` 환경 변수를 설정하세요.
🤗 Transformers를 로컬 파일만 사용하도록 해서 방화벽 또는 오프라인 환경에서 실행할 수 있습니다. 활성화하려면 `TRANSFORMERS_OFFLINE=1` 환경 변수를 설정하세요.
<Tip>
@ -174,7 +174,7 @@ python examples/pytorch/translation/run_translation.py --model_name_or_path goog
오프라인 기기에서 동일한 프로그램을 다음과 같이 실행할 수 있습니다.
```bash
HF_DATASETS_OFFLINE=1 HF_HUB_OFFLINE=1 \
HF_DATASETS_OFFLINE=1 TRANSFORMERS_OFFLINE=1 \
python examples/pytorch/translation/run_translation.py --model_name_or_path google-t5/t5-small --dataset_name wmt16 --dataset_config ro-en ...
```

View File

@ -86,10 +86,10 @@ model.load_adapter(peft_model_id)
`bitsandbytes` 통합은 8비트와 4비트 정밀도 데이터 유형을 지원하므로 큰 모델을 가져올 때 유용하면서 메모리도 절약합니다. 모델을 하드웨어에 효과적으로 분배하려면 [`~PreTrainedModel.from_pretrained`]에 `load_in_8bit` 또는 `load_in_4bit` 매개변수를 추가하고 `device_map="auto"`를 설정하세요:
```py
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from transformers import AutoModelForCausalLM, AutoTokenizer
peft_model_id = "ybelkada/opt-350m-lora"
model = AutoModelForCausalLM.from_pretrained(peft_model_id, quantization_config=BitsAndBytesConfig(load_in_8bit=True))
model = AutoModelForCausalLM.from_pretrained(peft_model_id, device_map="auto", load_in_8bit=True)
```
## 새 어댑터 추가 [[add-a-new-adapter]]

View File

@ -127,10 +127,10 @@ Int8 혼합 정밀도 행렬 분해는 행렬 곱셈을 두 개의 스트림으
필요한 라이브러리를 설치한 후 혼합 8비트 모델을 가져오는 방법은 다음과 같습니다:
```py
from transformers import AutoModelForCausalLM, BitsAndBytesConfig
from transformers import AutoModelForCausalLM
model_name = "bigscience/bloom-2b5"
model_8bit = AutoModelForCausalLM.from_pretrained(model_name, quantization_config=BitsAndBytesConfig(load_in_8bit=True))
model_8bit = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", load_in_8bit=True)
```
텍스트 생성의 경우:
@ -141,11 +141,11 @@ model_8bit = AutoModelForCausalLM.from_pretrained(model_name, quantization_confi
다음은 간단한 예입니다:
```py
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "bigscience/bloom-2b5"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model_8bit = AutoModelForCausalLM.from_pretrained(model_name, quantization_config=BitsAndBytesConfig(load_in_8bit=True))
model_8bit = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", load_in_8bit=True)
prompt = "Hello, my llama is cute"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
@ -159,7 +159,7 @@ outputs = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)
다중 GPU에서 혼합 8비트 모델을 로드하는 방법은 단일 GPU 설정과 동일합니다(동일한 명령어 사용):
```py
model_name = "bigscience/bloom-2b5"
model_8bit = AutoModelForCausalLM.from_pretrained(model_name, quantization_config=BitsAndBytesConfig(load_in_8bit=True))
model_8bit = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", load_in_8bit=True)
```
하지만 `accelerate`를 사용하여 각 GPU에 할당할 GPU RAM을 제어할 수 있습니다. 다음과 같이 `max_memory` 인수를 사용하세요:

View File

@ -173,7 +173,7 @@ No Windows, este diretório pré-definido é dado por `C:\Users\username\.cache\
## Modo Offline
O 🤗 Transformers também pode ser executado num ambiente de firewall ou fora da rede (offline) usando arquivos locais.
Para tal, configure a variável de ambiente de modo que `HF_HUB_OFFLINE=1`.
Para tal, configure a variável de ambiente de modo que `TRANSFORMERS_OFFLINE=1`.
<Tip>
@ -191,7 +191,7 @@ python examples/pytorch/translation/run_translation.py --model_name_or_path goog
Execute esse mesmo programa numa instância offline com o seguinte comando:
```bash
HF_DATASETS_OFFLINE=1 HF_HUB_OFFLINE=1 \
HF_DATASETS_OFFLINE=1 TRANSFORMERS_OFFLINE=1 \
python examples/pytorch/translation/run_translation.py --model_name_or_path google-t5/t5-small --dataset_name wmt16 --dataset_config ro-en ...
```

View File

@ -169,7 +169,7 @@ conda install conda-forge::transformers
## 离线模式
🤗 Transformers 可以仅使用本地文件在防火墙或离线环境中运行。设置环境变量 `HF_HUB_OFFLINE=1` 以启用该行为。
🤗 Transformers 可以仅使用本地文件在防火墙或离线环境中运行。设置环境变量 `TRANSFORMERS_OFFLINE=1` 以启用该行为。
<Tip>
@ -186,7 +186,7 @@ python examples/pytorch/translation/run_translation.py --model_name_or_path goog
在离线环境中运行相同的程序:
```bash
HF_DATASETS_OFFLINE=1 HF_HUB_OFFLINE=1 \
HF_DATASETS_OFFLINE=1 TRANSFORMERS_OFFLINE=1 \
python examples/pytorch/translation/run_translation.py --model_name_or_path google-t5/t5-small --dataset_name wmt16 --dataset_config ro-en ...
```

View File

@ -360,12 +360,12 @@ model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto", load_i
```python
# pip install transformers accelerate bitsandbytes
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "bigscience/bloom-1b7"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=BitsAndBytesConfig(load_in_8bit=True))
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto", load_in_8bit=True)
```
然后,像通常使用 `PreTrainedModel` 一样使用您的模型。
@ -441,9 +441,9 @@ model_double_quant = AutoModelForCausalLM.from_pretrained(model_id, quantization
```python
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("bigscience/bloom-560m", quantization_config=BitsAndBytesConfig(load_in_8bit=True))
model = AutoModelForCausalLM.from_pretrained("bigscience/bloom-560m", device_map="auto", load_in_8bit=True)
tokenizer = AutoTokenizer.from_pretrained("bigscience/bloom-560m")
model.push_to_hub("bloom-560m-8bit")

View File

@ -86,10 +86,10 @@ model.load_adapter(peft_model_id)
`bitsandbytes`集成支持8bit和4bit精度数据类型这对于加载大模型非常有用因为它可以节省内存请参阅`bitsandbytes`[指南](./quantization#bitsandbytes-integration)以了解更多信息)。要有效地将模型分配到您的硬件,请在[`~PreTrainedModel.from_pretrained`]中添加`load_in_8bit`或`load_in_4bit`参数,并将`device_map="auto"`设置为:
```py
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from transformers import AutoModelForCausalLM, AutoTokenizer
peft_model_id = "ybelkada/opt-350m-lora"
model = AutoModelForCausalLM.from_pretrained(peft_model_id, quantization_config=BitsAndBytesConfig(load_in_8bit=True))
model = AutoModelForCausalLM.from_pretrained(peft_model_id, device_map="auto", load_in_8bit=True)
```
## 添加新的adapter

View File

@ -47,7 +47,6 @@ Coming soon!
| [**`image-classification`**](https://github.com/huggingface/transformers/tree/main/examples/pytorch/image-classification) | [CIFAR-10](https://huggingface.co/datasets/cifar10) | ✅ | ✅ |✅ | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/image_classification.ipynb)
| [**`semantic-segmentation`**](https://github.com/huggingface/transformers/tree/main/examples/pytorch/semantic-segmentation) | [SCENE_PARSE_150](https://huggingface.co/datasets/scene_parse_150) | ✅ | ✅ |✅ | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/semantic_segmentation.ipynb)
| [**`object-detection`**](https://github.com/huggingface/transformers/tree/main/examples/pytorch/object-detection) | [CPPE-5](https://huggingface.co/datasets/cppe-5) | ✅ | ✅ |✅ | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/transformers_doc/en/pytorch/object_detection.ipynb)
| [**`instance-segmentation`**](https://github.com/huggingface/transformers/tree/main/examples/pytorch/instance-segmentation) | [ADE20K sample](https://huggingface.co/datasets/qubvel-hf/ade20k-mini) | ✅ | ✅ |✅ |
## Running quick tests

View File

@ -1,235 +0,0 @@
<!---
Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
# Instance Segmentation Examples
This directory contains two scripts that demonstrate how to fine-tune [MaskFormer](https://huggingface.co/docs/transformers/model_doc/maskformer) and [Mask2Former](https://huggingface.co/docs/transformers/model_doc/mask2former) for instance segmentation using PyTorch.
For other instance segmentation models, such as [DETR](https://huggingface.co/docs/transformers/model_doc/detr) and [Conditional DETR](https://huggingface.co/docs/transformers/model_doc/conditional_detr), the scripts need to be adjusted to properly handle input and output data.
Content:
- [PyTorch Version with Trainer](#pytorch-version-with-trainer)
- [PyTorch Version with Accelerate](#pytorch-version-with-accelerate)
- [Reload and Perform Inference](#reload-and-perform-inference)
- [Note on Custom Data](#note-on-custom-data)
## PyTorch Version with Trainer
This example is based on the script [`run_instance_segmentation.py`](https://github.com/huggingface/transformers/blob/main/examples/pytorch/instance-segmentation/run_instance_segmentation.py).
The script uses the [🤗 Trainer API](https://huggingface.co/docs/transformers/main_classes/trainer) to manage training automatically, including distributed environments.
Here, we show how to fine-tune a [Mask2Former](https://huggingface.co/docs/transformers/model_doc/mask2former) model on a subsample of the [ADE20K](https://huggingface.co/datasets/zhoubolei/scene_parse_150) dataset. We created a [small dataset](https://huggingface.co/datasets/qubvel-hf/ade20k-mini) with approximately 2,000 images containing only "person" and "car" annotations; all other pixels are marked as "background."
Here is the `label2id` mapping for this dataset:
```python
label2id = {
"background": 0,
"person": 1,
"car": 2,
}
```
Since the `background` label is not an instance and we don't want to predict it, we will use `do_reduce_labels` to remove it from the data.
Run the training with the following command:
```bash
python run_instance_segmentation.py \
--model_name_or_path facebook/mask2former-swin-tiny-coco-instance \
--output_dir finetune-instance-segmentation-ade20k-mini-mask2former \
--dataset_name qubvel-hf/ade20k-mini \
--do_reduce_labels \
--image_height 256 \
--image_width 256 \
--do_train \
--fp16 \
--num_train_epochs 40 \
--learning_rate 1e-5 \
--lr_scheduler_type constant \
--per_device_train_batch_size 8 \
--gradient_accumulation_steps 2 \
--dataloader_num_workers 8 \
--dataloader_persistent_workers \
--dataloader_prefetch_factor 4 \
--do_eval \
--evaluation_strategy epoch \
--logging_strategy epoch \
--save_strategy epoch \
--save_total_limit 2 \
--push_to_hub
```
The resulting model can be viewed [here](https://huggingface.co/qubvel-hf/finetune-instance-segmentation-ade20k-mini-mask2former). Always refer to the original paper for details on training hyperparameters. To improve model quality, consider:
- Changing image size parameters (`--image_height`/`--image_width`)
- Adjusting training parameters such as learning rate, batch size, warmup, optimizer, and more (see [TrainingArguments](https://huggingface.co/docs/transformers/main_classes/trainer#transformers.TrainingArguments))
- Adding more image augmentations (we created a helpful [HF Space](https://huggingface.co/spaces/qubvel-hf/albumentations-demo) to choose some)
You can also replace the model [checkpoint](https://huggingface.co/models?search=maskformer).
## PyTorch Version with Accelerate
This example is based on the script [`run_instance_segmentation_no_trainer.py`](https://github.com/huggingface/transformers/blob/main/examples/pytorch/instance-segmentation/run_instance_segmentation_no_trainer.py).
The script uses [🤗 Accelerate](https://github.com/huggingface/accelerate) to write your own training loop in PyTorch and run it on various environments, including CPU, multi-CPU, GPU, multi-GPU, and TPU, with support for mixed precision.
First, configure the environment:
```bash
accelerate config
```
Answer the questions regarding your training environment. Then, run:
```bash
accelerate test
```
This command ensures everything is ready for training. Finally, launch training with:
```bash
accelerate launch run_instance_segmentation_no_trainer.py \
--model_name_or_path facebook/mask2former-swin-tiny-coco-instance \
--output_dir finetune-instance-segmentation-ade20k-mini-mask2former-no-trainer \
--dataset_name qubvel-hf/ade20k-mini \
--do_reduce_labels \
--image_height 256 \
--image_width 256 \
--num_train_epochs 40 \
--learning_rate 1e-5 \
--lr_scheduler_type constant \
--per_device_train_batch_size 8 \
--gradient_accumulation_steps 2 \
--dataloader_num_workers 8 \
--push_to_hub
```
With this setup, you can train on multiple GPUs, log everything to trackers (like Weights and Biases, Tensorboard), and regularly push your model to the hub (with the repo name set to `args.output_dir` under your HF username).
With the default settings, the script fine-tunes a [Mask2Former](https://huggingface.co/docs/transformers/model_doc/mask2former) model on the sample of [ADE20K](https://huggingface.co/datasets/qubvel-hf/ade20k-mini) dataset. The resulting model can be viewed [here](https://huggingface.co/qubvel-hf/finetune-instance-segmentation-ade20k-mini-mask2former-no-trainer).
## Reload and Perform Inference
After training, you can easily load your trained model and perform inference as follows:
```python
import torch
import requests
import matplotlib.pyplot as plt
from PIL import Image
from transformers import Mask2FormerForUniversalSegmentation, Mask2FormerImageProcessor
# Load image
image = Image.open(requests.get("http://farm4.staticflickr.com/3017/3071497290_31f0393363_z.jpg", stream=True).raw)
# Load model and image processor
device = "cuda"
checkpoint = "qubvel-hf/finetune-instance-segmentation-ade20k-mini-mask2former"
model = Mask2FormerForUniversalSegmentation.from_pretrained(checkpoint, device_map=device)
image_processor = Mask2FormerImageProcessor.from_pretrained(checkpoint)
# Run inference on image
inputs = image_processor(images=[image], return_tensors="pt").to(device)
with torch.no_grad():
outputs = model(**inputs)
# Post-process outputs
outputs = image_processor.post_process_instance_segmentation(outputs, target_sizes=[image.size[::-1]])
print("Mask shape: ", outputs[0]["segmentation"].shape)
print("Mask values: ", outputs[0]["segmentation"].unique())
for segment in outputs[0]["segments_info"]:
print("Segment: ", segment)
```
```
Mask shape: torch.Size([427, 640])
Mask values: tensor([-1., 0., 1., 2., 3., 4., 5., 6.])
Segment: {'id': 0, 'label_id': 0, 'was_fused': False, 'score': 0.946127}
Segment: {'id': 1, 'label_id': 1, 'was_fused': False, 'score': 0.961582}
Segment: {'id': 2, 'label_id': 1, 'was_fused': False, 'score': 0.968367}
Segment: {'id': 3, 'label_id': 1, 'was_fused': False, 'score': 0.819527}
Segment: {'id': 4, 'label_id': 1, 'was_fused': False, 'score': 0.655761}
Segment: {'id': 5, 'label_id': 1, 'was_fused': False, 'score': 0.531299}
Segment: {'id': 6, 'label_id': 1, 'was_fused': False, 'score': 0.929477}
```
Use the following code to visualize the results:
```python
import numpy as np
import matplotlib.pyplot as plt
segmentation = outputs[0]["segmentation"].numpy()
plt.figure(figsize=(10, 10))
plt.subplot(1, 2, 1)
plt.imshow(np.array(image))
plt.axis("off")
plt.subplot(1, 2, 2)
plt.imshow(segmentation)
plt.axis("off")
plt.show()
```
![Result](https://i.imgur.com/rZmaRjD.png)
## Note on Custom Data
Here is a short script demonstrating how to create your own dataset for instance segmentation and push it to the hub:
> Note: Annotations should be represented as 3-channel images (similar to the [scene_parsing_150](https://huggingface.co/datasets/zhoubolei/scene_parse_150#instance_segmentation-1) dataset). The first channel is a semantic-segmentation map with values corresponding to `label2id`, the second is an instance-segmentation map where each instance has a unique value, and the third channel should be empty (filled with zeros).
```python
from datasets import Dataset, DatasetDict
from datasets import Image as DatasetImage
label2id = {
"background": 0,
"person": 1,
"car": 2,
}
train_split = {
"image": [<PIL Image 1>, <PIL Image 2>, <PIL Image 3>, ...],
"annotation": [<PIL Image ann 1>, <PIL Image ann 2>, <PIL Image ann 3>, ...],
}
validation_split = {
"image": [<PIL Image 101>, <PIL Image 102>, <PIL Image 103>, ...],
"annotation": [<PIL Image ann 101>, <PIL Image ann 102>, <PIL Image ann 103>, ...],
}
def create_instance_segmentation_dataset(label2id, **splits):
dataset_dict = {}
for split_name, split in splits.items():
split["semantic_class_to_id"] = [label2id] * len(split["image"])
dataset_split = (
Dataset.from_dict(split)
.cast_column("image", DatasetImage())
.cast_column("annotation", DatasetImage())
)
dataset_dict[split_name] = dataset_split
return DatasetDict(dataset_dict)
dataset = create_instance_segmentation_dataset(label2id, train=train_split, validation=validation_split)
dataset.push_to_hub("qubvel-hf/ade20k-nano")
```
Use this dataset for fine-tuning by specifying its name with `--dataset_name <your_dataset_repo>`.
See also: [Dataset Creation Guide](https://huggingface.co/docs/datasets/image_dataset#create-an-image-dataset)

View File

@ -1,5 +0,0 @@
albumentations >= 1.4.5
timm
datasets
torchmetrics
pycocotools

View File

@ -1,469 +0,0 @@
#!/usr/bin/env python
# coding=utf-8
# Copyright 2024 The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
"""Finetuning 🤗 Transformers model for instance segmentation leveraging the Trainer API."""
import logging
import os
import sys
from dataclasses import dataclass, field
from functools import partial
from typing import Any, Dict, List, Mapping, Optional
import albumentations as A
import numpy as np
import torch
from datasets import load_dataset
from torchmetrics.detection.mean_ap import MeanAveragePrecision
import transformers
from transformers import (
AutoImageProcessor,
AutoModelForUniversalSegmentation,
HfArgumentParser,
Trainer,
TrainingArguments,
)
from transformers.image_processing_utils import BatchFeature
from transformers.trainer import EvalPrediction
from transformers.trainer_utils import get_last_checkpoint
from transformers.utils import check_min_version, send_example_telemetry
from transformers.utils.versions import require_version
logger = logging.getLogger(__name__)
# Will error if the minimal version of Transformers is not installed. Remove at your own risks.
check_min_version("4.42.0.dev0")
require_version("datasets>=2.0.0", "To fix: pip install -r examples/pytorch/instance-segmentation/requirements.txt")
@dataclass
class Arguments:
"""
Arguments pertaining to what data we are going to input our model for training and eval.
Using `HfArgumentParser` we can turn this class into argparse arguments to be able to specify
them on the command line.
"""
model_name_or_path: str = field(
default="facebook/mask2former-swin-tiny-coco-instance",
metadata={"help": "Path to pretrained model or model identifier from huggingface.co/models"},
)
dataset_name: str = field(
default="qubvel-hf/ade20k-mini",
metadata={
"help": "Name of a dataset from the hub (could be your own, possibly private dataset hosted on the hub)."
},
)
image_height: Optional[int] = field(default=512, metadata={"help": "Image height after resizing."})
image_width: Optional[int] = field(default=512, metadata={"help": "Image width after resizing."})
token: str = field(
default=None,
metadata={
"help": (
"The token to use as HTTP bearer authorization for remote files. If not specified, will use the token "
"generated when running `huggingface-cli login` (stored in `~/.huggingface`)."
)
},
)
do_reduce_labels: bool = field(
default=False,
metadata={
"help": (
"If background class is labeled as 0 and you want to remove it from the labels, set this flag to True."
)
},
)
def augment_and_transform_batch(
examples: Mapping[str, Any], transform: A.Compose, image_processor: AutoImageProcessor
) -> BatchFeature:
batch = {
"pixel_values": [],
"mask_labels": [],
"class_labels": [],
}
for pil_image, pil_annotation in zip(examples["image"], examples["annotation"]):
image = np.array(pil_image)
semantic_and_instance_masks = np.array(pil_annotation)[..., :2]
# Apply augmentations
output = transform(image=image, mask=semantic_and_instance_masks)
aug_image = output["image"]
aug_semantic_and_instance_masks = output["mask"]
aug_instance_mask = aug_semantic_and_instance_masks[..., 1]
# Create mapping from instance id to semantic id
unique_semantic_id_instance_id_pairs = np.unique(aug_semantic_and_instance_masks.reshape(-1, 2), axis=0)
instance_id_to_semantic_id = {
instance_id: semantic_id for semantic_id, instance_id in unique_semantic_id_instance_id_pairs
}
# Apply the image processor transformations: resizing, rescaling, normalization
model_inputs = image_processor(
images=[aug_image],
segmentation_maps=[aug_instance_mask],
instance_id_to_semantic_id=instance_id_to_semantic_id,
return_tensors="pt",
)
batch["pixel_values"].append(model_inputs.pixel_values[0])
batch["mask_labels"].append(model_inputs.mask_labels[0])
batch["class_labels"].append(model_inputs.class_labels[0])
return batch
def collate_fn(examples):
batch = {}
batch["pixel_values"] = torch.stack([example["pixel_values"] for example in examples])
batch["class_labels"] = [example["class_labels"] for example in examples]
batch["mask_labels"] = [example["mask_labels"] for example in examples]
if "pixel_mask" in examples[0]:
batch["pixel_mask"] = torch.stack([example["pixel_mask"] for example in examples])
return batch
@dataclass
class ModelOutput:
class_queries_logits: torch.Tensor
masks_queries_logits: torch.Tensor
def nested_cpu(tensors):
if isinstance(tensors, (list, tuple)):
return type(tensors)(nested_cpu(t) for t in tensors)
elif isinstance(tensors, Mapping):
return type(tensors)({k: nested_cpu(t) for k, t in tensors.items()})
elif isinstance(tensors, torch.Tensor):
return tensors.cpu().detach()
else:
return tensors
class Evaluator:
"""
Compute metrics for the instance segmentation task.
"""
def __init__(
self,
image_processor: AutoImageProcessor,
id2label: Mapping[int, str],
threshold: float = 0.0,
):
"""
Initialize evaluator with image processor, id2label mapping and threshold for filtering predictions.
Args:
image_processor (AutoImageProcessor): Image processor for
`post_process_instance_segmentation` method.
id2label (Mapping[int, str]): Mapping from class id to class name.
threshold (float): Threshold to filter predicted boxes by confidence. Defaults to 0.0.
"""
self.image_processor = image_processor
self.id2label = id2label
self.threshold = threshold
self.metric = self.get_metric()
def get_metric(self):
metric = MeanAveragePrecision(iou_type="segm", class_metrics=True)
return metric
def reset_metric(self):
self.metric.reset()
def postprocess_target_batch(self, target_batch) -> List[Dict[str, torch.Tensor]]:
"""Collect targets in a form of list of dictionaries with keys "masks", "labels"."""
batch_masks = target_batch[0]
batch_labels = target_batch[1]
post_processed_targets = []
for masks, labels in zip(batch_masks, batch_labels):
post_processed_targets.append(
{
"masks": masks.to(dtype=torch.bool),
"labels": labels,
}
)
return post_processed_targets
def get_target_sizes(self, post_processed_targets) -> List[List[int]]:
target_sizes = []
for target in post_processed_targets:
target_sizes.append(target["masks"].shape[-2:])
return target_sizes
def postprocess_prediction_batch(self, prediction_batch, target_sizes) -> List[Dict[str, torch.Tensor]]:
"""Collect predictions in a form of list of dictionaries with keys "masks", "labels", "scores"."""
model_output = ModelOutput(class_queries_logits=prediction_batch[0], masks_queries_logits=prediction_batch[1])
post_processed_output = self.image_processor.post_process_instance_segmentation(
model_output,
threshold=self.threshold,
target_sizes=target_sizes,
return_binary_maps=True,
)
post_processed_predictions = []
for image_predictions, target_size in zip(post_processed_output, target_sizes):
if image_predictions["segments_info"]:
post_processed_image_prediction = {
"masks": image_predictions["segmentation"].to(dtype=torch.bool),
"labels": torch.tensor([x["label_id"] for x in image_predictions["segments_info"]]),
"scores": torch.tensor([x["score"] for x in image_predictions["segments_info"]]),
}
else:
# for void predictions, we need to provide empty tensors
post_processed_image_prediction = {
"masks": torch.zeros([0, *target_size], dtype=torch.bool),
"labels": torch.tensor([]),
"scores": torch.tensor([]),
}
post_processed_predictions.append(post_processed_image_prediction)
return post_processed_predictions
@torch.no_grad()
def __call__(self, evaluation_results: EvalPrediction, compute_result: bool = False) -> Mapping[str, float]:
"""
Update metrics with current evaluation results and return metrics if `compute_result` is True.
Args:
evaluation_results (EvalPrediction): Predictions and targets from evaluation.
compute_result (bool): Whether to compute and return metrics.
Returns:
Mapping[str, float]: Metrics in a form of dictionary {<metric_name>: <metric_value>}
"""
prediction_batch = nested_cpu(evaluation_results.predictions)
target_batch = nested_cpu(evaluation_results.label_ids)
# For metric computation we need to provide:
# - targets in a form of list of dictionaries with keys "masks", "labels"
# - predictions in a form of list of dictionaries with keys "masks", "labels", "scores"
post_processed_targets = self.postprocess_target_batch(target_batch)
target_sizes = self.get_target_sizes(post_processed_targets)
post_processed_predictions = self.postprocess_prediction_batch(prediction_batch, target_sizes)
# Compute metrics
self.metric.update(post_processed_predictions, post_processed_targets)
if not compute_result:
return
metrics = self.metric.compute()
# Replace list of per class metrics with separate metric for each class
classes = metrics.pop("classes")
map_per_class = metrics.pop("map_per_class")
mar_100_per_class = metrics.pop("mar_100_per_class")
for class_id, class_map, class_mar in zip(classes, map_per_class, mar_100_per_class):
class_name = self.id2label[class_id.item()] if self.id2label is not None else class_id.item()
metrics[f"map_{class_name}"] = class_map
metrics[f"mar_100_{class_name}"] = class_mar
metrics = {k: round(v.item(), 4) for k, v in metrics.items()}
# Reset metric for next evaluation
self.reset_metric()
return metrics
def setup_logging(training_args: TrainingArguments) -> None:
"""Setup logging according to `training_args`."""
logging.basicConfig(
format="%(asctime)s - %(levelname)s - %(name)s - %(message)s",
datefmt="%m/%d/%Y %H:%M:%S",
handlers=[logging.StreamHandler(sys.stdout)],
)
if training_args.should_log:
# The default of training_args.log_level is passive, so we set log level at info here to have that default.
transformers.utils.logging.set_verbosity_info()
log_level = training_args.get_process_log_level()
logger.setLevel(log_level)
transformers.utils.logging.set_verbosity(log_level)
transformers.utils.logging.enable_default_handler()
transformers.utils.logging.enable_explicit_format()
def find_last_checkpoint(training_args: TrainingArguments) -> Optional[str]:
"""Find the last checkpoint in the output directory according to parameters specified in `training_args`."""
checkpoint = None
if training_args.resume_from_checkpoint is not None:
checkpoint = training_args.resume_from_checkpoint
elif os.path.isdir(training_args.output_dir) and not training_args.overwrite_output_dir:
checkpoint = get_last_checkpoint(training_args.output_dir)
if checkpoint is None and len(os.listdir(training_args.output_dir)) > 0:
raise ValueError(
f"Output directory ({training_args.output_dir}) already exists and is not empty. "
"Use --overwrite_output_dir to overcome."
)
elif checkpoint is not None and training_args.resume_from_checkpoint is None:
logger.info(
f"Checkpoint detected, resuming training at {checkpoint}. To avoid this behavior, change "
"the `--output_dir` or add `--overwrite_output_dir` to train from scratch."
)
return checkpoint
def main():
# See all possible arguments in https://huggingface.co/docs/transformers/main_classes/trainer#transformers.TrainingArguments
# or by passing the --help flag to this script.
parser = HfArgumentParser([Arguments, TrainingArguments])
if len(sys.argv) == 2 and sys.argv[1].endswith(".json"):
# If we pass only one argument to the script and it's the path to a json file,
# let's parse it to get our arguments.
args, training_args = parser.parse_json_file(json_file=os.path.abspath(sys.argv[1]))
else:
args, training_args = parser.parse_args_into_dataclasses()
# Set default training arguments for instance segmentation
training_args.eval_do_concat_batches = False
training_args.batch_eval_metrics = True
training_args.remove_unused_columns = False
# # Sending telemetry. Tracking the example usage helps us better allocate resources to maintain them. The
# # information sent is the one passed as arguments along with your Python/PyTorch versions.
send_example_telemetry("run_instance_segmentation", args)
# Setup logging and log on each process the small summary:
setup_logging(training_args)
logger.warning(
f"Process rank: {training_args.local_rank}, device: {training_args.device}, n_gpu: {training_args.n_gpu}, "
+ f"distributed training: {training_args.parallel_mode.value == 'distributed'}, 16-bits training: {training_args.fp16}"
)
logger.info(f"Training/evaluation parameters {training_args}")
# Load last checkpoint from output_dir if it exists (and we are not overwriting it)
checkpoint = find_last_checkpoint(training_args)
# ------------------------------------------------------------------------------------------------
# Load dataset, prepare splits
# ------------------------------------------------------------------------------------------------
dataset = load_dataset(args.dataset_name)
# We need to specify the label2id mapping for the model
# it is a mapping from semantic class name to class index.
# In case your dataset does not provide it, you can create it manually:
# label2id = {"background": 0, "cat": 1, "dog": 2}
label2id = dataset["train"][0]["semantic_class_to_id"]
if args.do_reduce_labels:
label2id = {name: idx for name, idx in label2id.items() if idx != 0} # remove background class
label2id = {name: idx - 1 for name, idx in label2id.items()} # shift class indices by -1
id2label = {v: k for k, v in label2id.items()}
# ------------------------------------------------------------------------------------------------
# Load pretrained config, model and image processor
# ------------------------------------------------------------------------------------------------
model = AutoModelForUniversalSegmentation.from_pretrained(
args.model_name_or_path,
label2id=label2id,
id2label=id2label,
ignore_mismatched_sizes=True,
token=args.token,
)
image_processor = AutoImageProcessor.from_pretrained(
args.model_name_or_path,
do_resize=True,
size={"height": args.image_height, "width": args.image_width},
do_reduce_labels=args.do_reduce_labels,
reduce_labels=args.do_reduce_labels, # TODO: remove when mask2former support `do_reduce_labels`
token=args.token,
)
# ------------------------------------------------------------------------------------------------
# Define image augmentations and dataset transforms
# ------------------------------------------------------------------------------------------------
train_augment_and_transform = A.Compose(
[
A.HorizontalFlip(p=0.5),
A.RandomBrightnessContrast(p=0.5),
A.HueSaturationValue(p=0.1),
],
)
validation_transform = A.Compose(
[A.NoOp()],
)
# Make transform functions for batch and apply for dataset splits
train_transform_batch = partial(
augment_and_transform_batch, transform=train_augment_and_transform, image_processor=image_processor
)
validation_transform_batch = partial(
augment_and_transform_batch, transform=validation_transform, image_processor=image_processor
)
dataset["train"] = dataset["train"].with_transform(train_transform_batch)
dataset["validation"] = dataset["validation"].with_transform(validation_transform_batch)
# ------------------------------------------------------------------------------------------------
# Model training and evaluation with Trainer API
# ------------------------------------------------------------------------------------------------
compute_metrics = Evaluator(image_processor=image_processor, id2label=id2label, threshold=0.0)
trainer = Trainer(
model=model,
args=training_args,
train_dataset=dataset["train"] if training_args.do_train else None,
eval_dataset=dataset["validation"] if training_args.do_eval else None,
tokenizer=image_processor,
data_collator=collate_fn,
compute_metrics=compute_metrics,
)
# Training
if training_args.do_train:
train_result = trainer.train(resume_from_checkpoint=checkpoint)
trainer.save_model()
trainer.log_metrics("train", train_result.metrics)
trainer.save_metrics("train", train_result.metrics)
trainer.save_state()
# Final evaluation
if training_args.do_eval:
metrics = trainer.evaluate(eval_dataset=dataset["validation"], metric_key_prefix="test")
trainer.log_metrics("test", metrics)
trainer.save_metrics("test", metrics)
# Write model card and (optionally) push to hub
kwargs = {
"finetuned_from": args.model_name_or_path,
"dataset": args.dataset_name,
"tags": ["image-segmentation", "instance-segmentation", "vision"],
}
if training_args.push_to_hub:
trainer.push_to_hub(**kwargs)
else:
trainer.create_model_card(**kwargs)
if __name__ == "__main__":
main()

View File

@ -1,734 +0,0 @@
#!/usr/bin/env python
# coding=utf-8
# Copyright 2024 The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
"""Finetuning 🤗 Transformers model for instance segmentation with Accelerate 🚀."""
import argparse
import json
import logging
import math
import os
import sys
from functools import partial
from pathlib import Path
from typing import Any, Mapping
import albumentations as A
import datasets
import numpy as np
import torch
from accelerate import Accelerator
from accelerate.utils import set_seed
from datasets import load_dataset
from huggingface_hub import HfApi
from torch.utils.data import DataLoader
from torchmetrics.detection.mean_ap import MeanAveragePrecision
from tqdm import tqdm
import transformers
from transformers import (
AutoImageProcessor,
AutoModelForUniversalSegmentation,
SchedulerType,
get_scheduler,
)
from transformers.image_processing_utils import BatchFeature
from transformers.utils import check_min_version, send_example_telemetry
from transformers.utils.versions import require_version
logger = logging.getLogger(__name__)
# Will error if the minimal version of Transformers is not installed. Remove at your own risks.
check_min_version("4.42.0.dev0")
require_version("datasets>=2.0.0", "To fix: pip install -r examples/pytorch/instance-segmentation/requirements.txt")
def parse_args():
parser = argparse.ArgumentParser(description="Finetune a transformers model for instance segmentation task")
parser.add_argument(
"--model_name_or_path",
type=str,
help="Path to a pretrained model or model identifier from huggingface.co/models.",
default="facebook/mask2former-swin-tiny-coco-instance",
)
parser.add_argument(
"--dataset_name",
type=str,
help="Name of the dataset on the hub.",
default="qubvel-hf/ade20k-mini",
)
parser.add_argument(
"--image_height",
type=int,
default=384,
help="The height of the images to feed the model.",
)
parser.add_argument(
"--image_width",
type=int,
default=384,
help="The width of the images to feed the model.",
)
parser.add_argument(
"--do_reduce_labels",
action="store_true",
help="Whether to reduce the number of labels by removing the background class.",
)
parser.add_argument(
"--cache_dir",
type=str,
help="Path to a folder in which the model and dataset will be cached.",
)
parser.add_argument(
"--per_device_train_batch_size",
type=int,
default=8,
help="Batch size (per device) for the training dataloader.",
)
parser.add_argument(
"--per_device_eval_batch_size",
type=int,
default=8,
help="Batch size (per device) for the evaluation dataloader.",
)
parser.add_argument(
"--dataloader_num_workers",
type=int,
default=4,
help="Number of workers to use for the dataloaders.",
)
parser.add_argument(
"--learning_rate",
type=float,
default=5e-5,
help="Initial learning rate (after the potential warmup period) to use.",
)
parser.add_argument(
"--adam_beta1",
type=float,
default=0.9,
help="Beta1 for AdamW optimizer",
)
parser.add_argument(
"--adam_beta2",
type=float,
default=0.999,
help="Beta2 for AdamW optimizer",
)
parser.add_argument(
"--adam_epsilon",
type=float,
default=1e-8,
help="Epsilon for AdamW optimizer",
)
parser.add_argument("--num_train_epochs", type=int, default=3, help="Total number of training epochs to perform.")
parser.add_argument(
"--max_train_steps",
type=int,
default=None,
help="Total number of training steps to perform. If provided, overrides num_train_epochs.",
)
parser.add_argument(
"--gradient_accumulation_steps",
type=int,
default=1,
help="Number of updates steps to accumulate before performing a backward/update pass.",
)
parser.add_argument(
"--lr_scheduler_type",
type=SchedulerType,
default="linear",
help="The scheduler type to use.",
choices=["linear", "cosine", "cosine_with_restarts", "polynomial", "constant", "constant_with_warmup"],
)
parser.add_argument(
"--num_warmup_steps", type=int, default=0, help="Number of steps for the warmup in the lr scheduler."
)
parser.add_argument("--output_dir", type=str, default=None, help="Where to store the final model.")
parser.add_argument("--seed", type=int, default=None, help="A seed for reproducible training.")
parser.add_argument("--push_to_hub", action="store_true", help="Whether or not to push the model to the Hub.")
parser.add_argument(
"--hub_model_id", type=str, help="The name of the repository to keep in sync with the local `output_dir`."
)
parser.add_argument("--hub_token", type=str, help="The token to use to push to the Model Hub.")
parser.add_argument(
"--checkpointing_steps",
type=str,
default=None,
help="Whether the various states should be saved at the end of every n steps, or 'epoch' for each epoch.",
)
parser.add_argument(
"--resume_from_checkpoint",
type=str,
default=None,
help="If the training should continue from a checkpoint folder.",
)
parser.add_argument(
"--with_tracking",
required=False,
action="store_true",
help="Whether to enable experiment trackers for logging.",
)
parser.add_argument(
"--report_to",
type=str,
default="all",
help=(
'The integration to report the results and logs to. Supported platforms are `"tensorboard"`,'
' `"wandb"`, `"comet_ml"` and `"clearml"`. Use `"all"` (default) to report to all integrations. '
"Only applicable when `--with_tracking` is passed."
),
)
args = parser.parse_args()
# Sanity checks
if args.push_to_hub or args.with_tracking:
if args.output_dir is None:
raise ValueError(
"Need an `output_dir` to create a repo when `--push_to_hub` or `with_tracking` is specified."
)
if args.output_dir is not None:
os.makedirs(args.output_dir, exist_ok=True)
return args
def augment_and_transform_batch(
examples: Mapping[str, Any], transform: A.Compose, image_processor: AutoImageProcessor
) -> BatchFeature:
batch = {
"pixel_values": [],
"mask_labels": [],
"class_labels": [],
}
for pil_image, pil_annotation in zip(examples["image"], examples["annotation"]):
image = np.array(pil_image)
semantic_and_instance_masks = np.array(pil_annotation)[..., :2]
# Apply augmentations
output = transform(image=image, mask=semantic_and_instance_masks)
aug_image = output["image"]
aug_semantic_and_instance_masks = output["mask"]
aug_instance_mask = aug_semantic_and_instance_masks[..., 1]
# Create mapping from instance id to semantic id
unique_semantic_id_instance_id_pairs = np.unique(aug_semantic_and_instance_masks.reshape(-1, 2), axis=0)
instance_id_to_semantic_id = {
instance_id: semantic_id for semantic_id, instance_id in unique_semantic_id_instance_id_pairs
}
# Apply the image processor transformations: resizing, rescaling, normalization
model_inputs = image_processor(
images=[aug_image],
segmentation_maps=[aug_instance_mask],
instance_id_to_semantic_id=instance_id_to_semantic_id,
return_tensors="pt",
)
batch["pixel_values"].append(model_inputs.pixel_values[0])
batch["mask_labels"].append(model_inputs.mask_labels[0])
batch["class_labels"].append(model_inputs.class_labels[0])
return batch
def collate_fn(examples):
batch = {}
batch["pixel_values"] = torch.stack([example["pixel_values"] for example in examples])
batch["class_labels"] = [example["class_labels"] for example in examples]
batch["mask_labels"] = [example["mask_labels"] for example in examples]
if "pixel_mask" in examples[0]:
batch["pixel_mask"] = torch.stack([example["pixel_mask"] for example in examples])
return batch
def nested_cpu(tensors):
if isinstance(tensors, (list, tuple)):
return type(tensors)(nested_cpu(t) for t in tensors)
elif isinstance(tensors, Mapping):
return type(tensors)({k: nested_cpu(t) for k, t in tensors.items()})
elif isinstance(tensors, torch.Tensor):
return tensors.cpu().detach()
else:
return tensors
def evaluation_loop(model, image_processor, accelerator: Accelerator, dataloader, id2label):
metric = MeanAveragePrecision(iou_type="segm", class_metrics=True)
for inputs in tqdm(dataloader, total=len(dataloader), disable=not accelerator.is_local_main_process):
with torch.no_grad():
outputs = model(**inputs)
inputs = accelerator.gather_for_metrics(inputs)
inputs = nested_cpu(inputs)
outputs = accelerator.gather_for_metrics(outputs)
outputs = nested_cpu(outputs)
# For metric computation we need to provide:
# - targets in a form of list of dictionaries with keys "masks", "labels"
# - predictions in a form of list of dictionaries with keys "masks", "labels", "scores"
post_processed_targets = []
post_processed_predictions = []
target_sizes = []
# Collect targets
for masks, labels in zip(inputs["mask_labels"], inputs["class_labels"]):
post_processed_targets.append(
{
"masks": masks.to(dtype=torch.bool),
"labels": labels,
}
)
target_sizes.append(masks.shape[-2:])
# Collect predictions
post_processed_output = image_processor.post_process_instance_segmentation(
outputs,
threshold=0.0,
target_sizes=target_sizes,
return_binary_maps=True,
)
for image_predictions, target_size in zip(post_processed_output, target_sizes):
if image_predictions["segments_info"]:
post_processed_image_prediction = {
"masks": image_predictions["segmentation"].to(dtype=torch.bool),
"labels": torch.tensor([x["label_id"] for x in image_predictions["segments_info"]]),
"scores": torch.tensor([x["score"] for x in image_predictions["segments_info"]]),
}
else:
# for void predictions, we need to provide empty tensors
post_processed_image_prediction = {
"masks": torch.zeros([0, *target_size], dtype=torch.bool),
"labels": torch.tensor([]),
"scores": torch.tensor([]),
}
post_processed_predictions.append(post_processed_image_prediction)
# Update metric for batch targets and predictions
metric.update(post_processed_predictions, post_processed_targets)
# Compute metrics
metrics = metric.compute()
# Replace list of per class metrics with separate metric for each class
classes = metrics.pop("classes")
map_per_class = metrics.pop("map_per_class")
mar_100_per_class = metrics.pop("mar_100_per_class")
for class_id, class_map, class_mar in zip(classes, map_per_class, mar_100_per_class):
class_name = id2label[class_id.item()] if id2label is not None else class_id.item()
metrics[f"map_{class_name}"] = class_map
metrics[f"mar_100_{class_name}"] = class_mar
metrics = {k: round(v.item(), 4) for k, v in metrics.items()}
return metrics
def setup_logging(accelerator: Accelerator) -> None:
"""Setup logging according to `training_args`."""
logging.basicConfig(
format="%(asctime)s - %(levelname)s - %(name)s - %(message)s",
datefmt="%m/%d/%Y %H:%M:%S",
handlers=[logging.StreamHandler(sys.stdout)],
)
if accelerator.is_local_main_process:
datasets.utils.logging.set_verbosity_warning()
transformers.utils.logging.set_verbosity_info()
logger.setLevel(logging.INFO)
else:
datasets.utils.logging.set_verbosity_error()
transformers.utils.logging.set_verbosity_error()
def handle_repository_creation(accelerator: Accelerator, args: argparse.Namespace):
"""Create a repository for the model and dataset if `args.push_to_hub` is set."""
repo_id = None
if accelerator.is_main_process:
if args.push_to_hub:
# Retrieve of infer repo_name
repo_name = args.hub_model_id
if repo_name is None:
repo_name = Path(args.output_dir).absolute().name
# Create repo and retrieve repo_id
api = HfApi()
repo_id = api.create_repo(repo_name, exist_ok=True, token=args.hub_token).repo_id
with open(os.path.join(args.output_dir, ".gitignore"), "w+") as gitignore:
if "step_*" not in gitignore:
gitignore.write("step_*\n")
if "epoch_*" not in gitignore:
gitignore.write("epoch_*\n")
elif args.output_dir is not None:
os.makedirs(args.output_dir, exist_ok=True)
accelerator.wait_for_everyone()
return repo_id
def main():
args = parse_args()
# Sending telemetry. Tracking the example usage helps us better allocate resources to maintain them. The
# information sent is the one passed as arguments along with your Python/PyTorch versions.
send_example_telemetry("run_instance_segmentation_no_trainer", args)
# Initialize the accelerator. We will let the accelerator handle device placement for us in this example.
# If we're using tracking, we also need to initialize it here and it will by default pick up all supported trackers
# in the environment
accelerator_log_kwargs = {}
if args.with_tracking:
accelerator_log_kwargs["log_with"] = args.report_to
accelerator_log_kwargs["project_dir"] = args.output_dir
accelerator = Accelerator(gradient_accumulation_steps=args.gradient_accumulation_steps, **accelerator_log_kwargs)
setup_logging(accelerator)
# If passed along, set the training seed now.
# We set device_specific to True as we want different data augmentation per device.
if args.seed is not None:
set_seed(args.seed, device_specific=True)
# Create repository if push ot hub is specified
repo_id = handle_repository_creation(accelerator, args)
if args.push_to_hub:
api = HfApi()
# ------------------------------------------------------------------------------------------------
# Load dataset, prepare splits
# ------------------------------------------------------------------------------------------------
# In distributed training, the load_dataset function guarantees that only one local process can concurrently
# download the dataset.
dataset = load_dataset(args.dataset_name, cache_dir=args.cache_dir)
# We need to specify the label2id mapping for the model
# it is a mapping from semantic class name to class index.
# In case your dataset does not provide it, you can create it manually:
# label2id = {"background": 0, "cat": 1, "dog": 2}
label2id = dataset["train"][0]["semantic_class_to_id"]
if args.do_reduce_labels:
label2id = {name: idx for name, idx in label2id.items() if idx != 0} # remove background class
label2id = {name: idx - 1 for name, idx in label2id.items()} # shift class indices by -1
id2label = {v: k for k, v in label2id.items()}
# ------------------------------------------------------------------------------------------------
# Load pretrained model and image processor
# ------------------------------------------------------------------------------------------------
model = AutoModelForUniversalSegmentation.from_pretrained(
args.model_name_or_path,
label2id=label2id,
id2label=id2label,
ignore_mismatched_sizes=True,
token=args.hub_token,
)
image_processor = AutoImageProcessor.from_pretrained(
args.model_name_or_path,
do_resize=True,
size={"height": args.image_height, "width": args.image_width},
do_reduce_labels=args.do_reduce_labels,
reduce_labels=args.do_reduce_labels, # TODO: remove when mask2former support `do_reduce_labels`
token=args.hub_token,
)
# ------------------------------------------------------------------------------------------------
# Define image augmentations and dataset transforms
# ------------------------------------------------------------------------------------------------
train_augment_and_transform = A.Compose(
[
A.HorizontalFlip(p=0.5),
A.RandomBrightnessContrast(p=0.5),
A.HueSaturationValue(p=0.1),
],
)
validation_transform = A.Compose(
[A.NoOp()],
)
# Make transform functions for batch and apply for dataset splits
train_transform_batch = partial(
augment_and_transform_batch, transform=train_augment_and_transform, image_processor=image_processor
)
validation_transform_batch = partial(
augment_and_transform_batch, transform=validation_transform, image_processor=image_processor
)
with accelerator.main_process_first():
dataset["train"] = dataset["train"].with_transform(train_transform_batch)
dataset["validation"] = dataset["validation"].with_transform(validation_transform_batch)
dataloader_common_args = {
"num_workers": args.dataloader_num_workers,
"persistent_workers": True,
"collate_fn": collate_fn,
}
train_dataloader = DataLoader(
dataset["train"], shuffle=True, batch_size=args.per_device_train_batch_size, **dataloader_common_args
)
valid_dataloader = DataLoader(
dataset["validation"], shuffle=False, batch_size=args.per_device_eval_batch_size, **dataloader_common_args
)
# ------------------------------------------------------------------------------------------------
# Define optimizer, scheduler and prepare everything with the accelerator
# ------------------------------------------------------------------------------------------------
# Optimizer
optimizer = torch.optim.AdamW(
list(model.parameters()),
lr=args.learning_rate,
betas=[args.adam_beta1, args.adam_beta2],
eps=args.adam_epsilon,
)
# Figure out how many steps we should save the Accelerator states
checkpointing_steps = args.checkpointing_steps
if checkpointing_steps is not None and checkpointing_steps.isdigit():
checkpointing_steps = int(checkpointing_steps)
# Scheduler and math around the number of training steps.
overrode_max_train_steps = False
num_update_steps_per_epoch = math.ceil(len(train_dataloader) / args.gradient_accumulation_steps)
if args.max_train_steps is None:
args.max_train_steps = args.num_train_epochs * num_update_steps_per_epoch
overrode_max_train_steps = True
lr_scheduler = get_scheduler(
name=args.lr_scheduler_type,
optimizer=optimizer,
num_warmup_steps=args.num_warmup_steps * accelerator.num_processes,
num_training_steps=args.max_train_steps
if overrode_max_train_steps
else args.max_train_steps * accelerator.num_processes,
)
# Prepare everything with our `accelerator`.
model, optimizer, train_dataloader, valid_dataloader, lr_scheduler = accelerator.prepare(
model, optimizer, train_dataloader, valid_dataloader, lr_scheduler
)
# We need to recalculate our total training steps as the size of the training dataloader may have changed.
num_update_steps_per_epoch = math.ceil(len(train_dataloader) / args.gradient_accumulation_steps)
if overrode_max_train_steps:
args.max_train_steps = args.num_train_epochs * num_update_steps_per_epoch
# Afterwards we recalculate our number of training epochs
args.num_train_epochs = math.ceil(args.max_train_steps / num_update_steps_per_epoch)
# We need to initialize the trackers we use, and also store our configuration.
# The trackers initializes automatically on the main process.
if args.with_tracking:
experiment_config = vars(args)
# TensorBoard cannot log Enums, need the raw value
experiment_config["lr_scheduler_type"] = experiment_config["lr_scheduler_type"].value
accelerator.init_trackers("instance_segmentation_no_trainer", experiment_config)
# ------------------------------------------------------------------------------------------------
# Run training with evaluation on each epoch
# ------------------------------------------------------------------------------------------------
total_batch_size = args.per_device_train_batch_size * accelerator.num_processes * args.gradient_accumulation_steps
logger.info("***** Running training *****")
logger.info(f" Num examples = {len(dataset['train'])}")
logger.info(f" Num Epochs = {args.num_train_epochs}")
logger.info(f" Instantaneous batch size per device = {args.per_device_train_batch_size}")
logger.info(f" Total train batch size (w. parallel, distributed & accumulation) = {total_batch_size}")
logger.info(f" Gradient Accumulation steps = {args.gradient_accumulation_steps}")
logger.info(f" Total optimization steps = {args.max_train_steps}")
# Only show the progress bar once on each machine.
progress_bar = tqdm(range(args.max_train_steps), disable=not accelerator.is_local_main_process)
completed_steps = 0
starting_epoch = 0
# Potentially load in the weights and states from a previous save
if args.resume_from_checkpoint:
if args.resume_from_checkpoint is not None or args.resume_from_checkpoint != "":
checkpoint_path = args.resume_from_checkpoint
path = os.path.basename(args.resume_from_checkpoint)
else:
# Get the most recent checkpoint
dirs = [f.name for f in os.scandir(os.getcwd()) if f.is_dir()]
dirs.sort(key=os.path.getctime)
path = dirs[-1] # Sorts folders by date modified, most recent checkpoint is the last
checkpoint_path = path
path = os.path.basename(checkpoint_path)
accelerator.print(f"Resumed from checkpoint: {checkpoint_path}")
accelerator.load_state(checkpoint_path)
# Extract `epoch_{i}` or `step_{i}`
training_difference = os.path.splitext(path)[0]
if "epoch" in training_difference:
starting_epoch = int(training_difference.replace("epoch_", "")) + 1
resume_step = None
completed_steps = starting_epoch * num_update_steps_per_epoch
else:
# need to multiply `gradient_accumulation_steps` to reflect real steps
resume_step = int(training_difference.replace("step_", "")) * args.gradient_accumulation_steps
starting_epoch = resume_step // len(train_dataloader)
completed_steps = resume_step // args.gradient_accumulation_steps
resume_step -= starting_epoch * len(train_dataloader)
# update the progress_bar if load from checkpoint
progress_bar.update(completed_steps)
for epoch in range(starting_epoch, args.num_train_epochs):
model.train()
if args.with_tracking:
total_loss = 0
if args.resume_from_checkpoint and epoch == starting_epoch and resume_step is not None:
# We skip the first `n` batches in the dataloader when resuming from a checkpoint
active_dataloader = accelerator.skip_first_batches(train_dataloader, resume_step)
else:
active_dataloader = train_dataloader
for step, batch in enumerate(active_dataloader):
with accelerator.accumulate(model):
outputs = model(**batch)
loss = outputs.loss
# We keep track of the loss at each epoch
if args.with_tracking:
total_loss += loss.detach().float()
accelerator.backward(loss)
optimizer.step()
lr_scheduler.step()
optimizer.zero_grad()
# Checks if the accelerator has performed an optimization step behind the scenes
if accelerator.sync_gradients:
progress_bar.update(1)
completed_steps += 1
if isinstance(checkpointing_steps, int):
if completed_steps % checkpointing_steps == 0:
output_dir = f"step_{completed_steps}"
if args.output_dir is not None:
output_dir = os.path.join(args.output_dir, output_dir)
accelerator.save_state(output_dir)
if args.push_to_hub and epoch < args.num_train_epochs - 1:
accelerator.wait_for_everyone()
unwrapped_model = accelerator.unwrap_model(model)
unwrapped_model.save_pretrained(
args.output_dir,
is_main_process=accelerator.is_main_process,
save_function=accelerator.save,
)
if accelerator.is_main_process:
image_processor.save_pretrained(args.output_dir)
api.upload_folder(
repo_id=repo_id,
commit_message=f"Training in progress epoch {epoch}",
folder_path=args.output_dir,
repo_type="model",
token=args.hub_token,
)
if completed_steps >= args.max_train_steps:
break
logger.info("***** Running evaluation *****")
metrics = evaluation_loop(model, image_processor, accelerator, valid_dataloader, id2label)
logger.info(f"epoch {epoch}: {metrics}")
if args.with_tracking:
accelerator.log(
{
"train_loss": total_loss.item() / len(train_dataloader),
**metrics,
"epoch": epoch,
"step": completed_steps,
},
step=completed_steps,
)
if args.push_to_hub and epoch < args.num_train_epochs - 1:
accelerator.wait_for_everyone()
unwrapped_model = accelerator.unwrap_model(model)
unwrapped_model.save_pretrained(
args.output_dir, is_main_process=accelerator.is_main_process, save_function=accelerator.save
)
if accelerator.is_main_process:
image_processor.save_pretrained(args.output_dir)
api.upload_folder(
commit_message=f"Training in progress epoch {epoch}",
folder_path=args.output_dir,
repo_id=repo_id,
repo_type="model",
token=args.hub_token,
)
if args.checkpointing_steps == "epoch":
output_dir = f"epoch_{epoch}"
if args.output_dir is not None:
output_dir = os.path.join(args.output_dir, output_dir)
accelerator.save_state(output_dir)
# ------------------------------------------------------------------------------------------------
# Run evaluation on test dataset and save the model
# ------------------------------------------------------------------------------------------------
logger.info("***** Running evaluation on test dataset *****")
metrics = evaluation_loop(model, image_processor, accelerator, valid_dataloader, id2label)
metrics = {f"test_{k}": v for k, v in metrics.items()}
logger.info(f"Test metrics: {metrics}")
if args.with_tracking:
accelerator.end_training()
if args.output_dir is not None:
accelerator.wait_for_everyone()
unwrapped_model = accelerator.unwrap_model(model)
unwrapped_model.save_pretrained(
args.output_dir, is_main_process=accelerator.is_main_process, save_function=accelerator.save
)
if accelerator.is_main_process:
with open(os.path.join(args.output_dir, "all_results.json"), "w") as f:
json.dump(metrics, f, indent=2)
image_processor.save_pretrained(args.output_dir)
if args.push_to_hub:
api.upload_folder(
commit_message="End of training",
folder_path=args.output_dir,
repo_id=repo_id,
repo_type="model",
token=args.hub_token,
ignore_patterns=["epoch_*"],
)
if __name__ == "__main__":
main()

View File

@ -355,28 +355,3 @@ class ExamplesTestsNoTrainer(TestCasePlus):
run_command(self._launch_args + testargs)
result = get_results(tmp_dir)
self.assertGreaterEqual(result["test_map"], 0.10)
@slow
@mock.patch.dict(os.environ, {"WANDB_MODE": "offline", "DVCLIVE_TEST": "true"})
def test_run_instance_segmentation_no_trainer(self):
stream_handler = logging.StreamHandler(sys.stdout)
logger.addHandler(stream_handler)
tmp_dir = self.get_auto_remove_tmp_dir()
testargs = f"""
{self.examples_dir}/pytorch/instance-segmentation/run_instance_segmentation_no_trainer.py
--model_name_or_path qubvel-hf/finetune-instance-segmentation-ade20k-mini-mask2former
--output_dir {tmp_dir}
--dataset_name qubvel-hf/ade20k-nano
--do_reduce_labels
--image_height 256
--image_width 256
--num_train_epochs 1
--per_device_train_batch_size 2
--per_device_eval_batch_size 1
--seed 1234
""".split()
run_command(self._launch_args + testargs)
result = get_results(tmp_dir)
self.assertGreaterEqual(result["test_map"], 0.1)

View File

@ -49,7 +49,6 @@ SRC_DIRS = [
"image-pretraining",
"semantic-segmentation",
"object-detection",
"instance-segmentation",
]
]
sys.path.extend(SRC_DIRS)
@ -61,7 +60,6 @@ if SRC_DIRS is not None:
import run_generation
import run_glue
import run_image_classification
import run_instance_segmentation
import run_mae
import run_mlm
import run_ner
@ -641,33 +639,3 @@ class ExamplesTests(TestCasePlus):
run_object_detection.main()
result = get_results(tmp_dir)
self.assertGreaterEqual(result["test_map"], 0.1)
@patch.dict(os.environ, {"WANDB_DISABLED": "true"})
def test_run_instance_segmentation(self):
tmp_dir = self.get_auto_remove_tmp_dir()
testargs = f"""
run_instance_segmentation.py
--model_name_or_path qubvel-hf/finetune-instance-segmentation-ade20k-mini-mask2former
--output_dir {tmp_dir}
--dataset_name qubvel-hf/ade20k-nano
--do_reduce_labels
--image_height 256
--image_width 256
--do_train
--num_train_epochs 1
--learning_rate 1e-5
--lr_scheduler_type constant
--per_device_train_batch_size 2
--per_device_eval_batch_size 1
--do_eval
--evaluation_strategy epoch
--seed 32
""".split()
if is_torch_fp16_available_on_device(torch_device):
testargs.append("--fp16")
with patch.object(sys, "argv", testargs):
run_instance_segmentation.main()
result = get_results(tmp_dir)
self.assertGreaterEqual(result["test_map"], 0.1)

File diff suppressed because it is too large Load Diff

View File

@ -26,7 +26,7 @@ from .agent_types import AgentAudio, AgentImage, AgentText
from .default_tools import BASE_PYTHON_TOOLS, FinalAnswerTool, setup_default_tools
from .llm_engine import HfEngine, MessageRole
from .prompts import DEFAULT_CODE_SYSTEM_PROMPT, DEFAULT_REACT_CODE_SYSTEM_PROMPT, DEFAULT_REACT_JSON_SYSTEM_PROMPT
from .python_interpreter import LIST_SAFE_MODULES, evaluate_python_code
from .python_interpreter import evaluate_python_code
from .tools import (
DEFAULT_TOOL_DESCRIPTION_TEMPLATE,
Tool,
@ -84,14 +84,8 @@ def parse_json_blob(json_blob: str) -> Dict[str, str]:
return json_data
except json.JSONDecodeError as e:
place = e.pos
if json_blob[place - 1 : place + 2] == "},\n":
raise ValueError(
"JSON is invalid: you probably tried to provide multiple tool calls in one action. PROVIDE ONLY ONE TOOL CALL."
)
raise ValueError(
f"The JSON blob you used is invalid due to the following error: {e}.\n"
f"JSON blob was: {json_blob}, decoding failed on that specific part of the blob:\n"
f"'{json_blob[place-4:place+5]}'."
f"The JSON blob you used is invalid: due to the following error: {e}. JSON blob was: {json_blob}, decoding failed at '{json_blob[place-4:place+5]}'."
)
except Exception as e:
raise ValueError(f"Error in parsing the JSON blob: {e}")
@ -353,7 +347,6 @@ class Agent:
return self._toolbox
def initialize_for_run(self, task: str, **kwargs):
self.token_count = 0
self.task = task
if len(kwargs) > 0:
self.task += f"\nYou have been provided with these initial arguments: {str(kwargs)}."
@ -387,7 +380,7 @@ class Agent:
message_content = (
"Error: "
+ str(step_log["error"])
+ "\nNow let's retry: take care not to repeat previous errors! If you have retried several times, try a completely different approach.\n"
+ "\nNow let's retry: take care not to repeat previous errors! Try to adopt different approaches.\n"
)
elif "observation" in step_log:
message_content = f"Observation: {step_log['observation']}"
@ -416,9 +409,6 @@ class Agent:
)
return memory
def get_succinct_logs(self):
return [{key: value for key, value in log.items() if key != "agent_memory"} for log in self.logs]
def extract_action(self, llm_output: str, split_token: str) -> str:
"""
Parse action from the LLM output
@ -496,7 +486,6 @@ class CodeAgent(Agent):
llm_engine: Callable = HfEngine(),
system_prompt: str = DEFAULT_CODE_SYSTEM_PROMPT,
tool_description_template: str = DEFAULT_TOOL_DESCRIPTION_TEMPLATE,
additional_authorized_imports: List[str] = [],
**kwargs,
):
super().__init__(
@ -515,7 +504,6 @@ class CodeAgent(Agent):
)
self.python_evaluator = evaluate_python_code
self.additional_authorized_imports = additional_authorized_imports
def parse_code_blob(self, result: str) -> str:
"""
@ -556,7 +544,7 @@ class CodeAgent(Agent):
self.prompt = [prompt_message, task_message]
self.logger.info("====Executing with this prompt====")
self.logger.info(self.prompt)
llm_output = self.llm_engine(self.prompt, stop_sequences=["<end_action>"])
llm_output = self.llm_engine(self.prompt, stop_sequences=["<end_code>"])
if return_generated_code:
return llm_output
@ -575,12 +563,7 @@ class CodeAgent(Agent):
self.log_code_action(code_action)
try:
available_tools = {**BASE_PYTHON_TOOLS.copy(), **self.toolbox.tools}
output = self.python_evaluator(
code_action,
available_tools,
state=self.state,
authorized_imports=LIST_SAFE_MODULES + self.additional_authorized_imports,
)
output = self.python_evaluator(code_action, available_tools, state=self.state)
self.logger.info(self.state["print_outputs"])
return output
except Exception as e:
@ -614,29 +597,7 @@ class ReactAgent(Agent):
if "final_answer" not in self._toolbox.tools:
self._toolbox.add_tool(FinalAnswerTool())
def provide_final_answer(self, task) -> str:
"""
This method provides a final answer to the task, based on the logs of the agent's interactions.
"""
self.prompt = [
{
"role": MessageRole.SYSTEM,
"content": "An agent tried to answer an user query but it got stuck and failed to do so. You are tasked with providing an answer instead. Here is the agent's memory:",
}
]
self.prompt += self.write_inner_memory_from_logs()[1:]
self.prompt += [
{
"role": MessageRole.USER,
"content": f"Based on the above, please provide an answer to the following user request:\n{task}",
}
]
try:
return self.llm_engine(self.prompt)
except Exception as e:
return f"Error in generating final llm output: {e}."
def run(self, task: str, stream: bool = False, **kwargs):
def run(self, task: str, **kwargs):
"""
Runs the agent for the given task.
@ -653,49 +614,13 @@ class ReactAgent(Agent):
agent.run("What is the result of 2 power 3.7384?")
```
"""
if stream:
return self.stream_run(task, **kwargs)
else:
return self.direct_run(task, **kwargs)
def stream_run(self, task: str, **kwargs):
self.initialize_for_run(task, **kwargs)
final_answer = None
iteration = 0
while final_answer is None and iteration < self.max_iterations:
try:
step_logs = self.step()
if "final_answer" in step_logs:
final_answer = step_logs["final_answer"]
except AgentError as e:
self.logger.error(e, exc_info=1)
self.logs[-1]["error"] = e
finally:
iteration += 1
yield self.logs[-1]
if final_answer is None and iteration == self.max_iterations:
error_message = "Reached max iterations."
final_step_log = {"error": AgentMaxIterationsError(error_message)}
self.logs.append(final_step_log)
self.logger.error(error_message, exc_info=1)
final_answer = self.provide_final_answer(task)
final_step_log["final_answer"] = final_answer
yield final_step_log
yield final_answer
def direct_run(self, task: str, **kwargs):
self.initialize_for_run(task, **kwargs)
final_answer = None
iteration = 0
while final_answer is None and iteration < self.max_iterations:
try:
step_logs = self.step()
if "final_answer" in step_logs:
final_answer = step_logs["final_answer"]
final_answer = self.step()
except AgentError as e:
self.logger.error(e, exc_info=1)
self.logs[-1]["error"] = e
@ -704,11 +629,26 @@ class ReactAgent(Agent):
if final_answer is None and iteration == self.max_iterations:
error_message = "Reached max iterations."
final_step_log = {"error": AgentMaxIterationsError(error_message)}
self.logs.append(final_step_log)
self.logs.append({"error": AgentMaxIterationsError(error_message)})
self.logger.error(error_message, exc_info=1)
final_answer = self.provide_final_answer(task)
final_step_log["final_answer"] = final_answer
self.prompt = [
{
"role": MessageRole.SYSTEM,
"content": "An agent tried to answer a user query but it failed to do so. You are tasked with providing an answer instead. Here is the agent's memory:",
}
]
self.prompt += self.write_inner_memory_from_logs()[1:]
self.prompt += [
{
"role": MessageRole.USER,
"content": f"Based on the above, please provide an answer to the following user request:\n{task}",
}
]
try:
final_answer = self.llm_engine(self.prompt, stop_sequences=["Observation:"])
except Exception as e:
final_answer = f"Error in generating final llm output: {e}."
return final_answer
@ -743,24 +683,22 @@ class ReactJsonAgent(ReactAgent):
"""
agent_memory = self.write_inner_memory_from_logs()
self.logs[-1]["agent_memory"] = agent_memory.copy()
self.prompt = agent_memory
self.logger.debug("===== New step =====")
# Add new step in logs
current_step_logs = {}
self.logs.append(current_step_logs)
current_step_logs["agent_memory"] = agent_memory.copy()
self.logs.append({})
self.logger.info("===== Calling LLM with this last message: =====")
self.logger.info(self.prompt[-1])
try:
llm_output = self.llm_engine(self.prompt, stop_sequences=["<end_action>", "Observation:"])
llm_output = self.llm_engine(self.prompt, stop_sequences=["Observation:"])
except Exception as e:
raise AgentGenerationError(f"Error in generating llm output: {e}.")
self.logger.debug("===== Output message of the LLM: =====")
self.logger.debug(llm_output)
current_step_logs["llm_output"] = llm_output
self.logs[-1]["llm_output"] = llm_output
# Parse
self.logger.debug("===== Extracting action =====")
@ -771,8 +709,8 @@ class ReactJsonAgent(ReactAgent):
except Exception as e:
raise AgentParsingError(f"Could not parse the given action: {e}.")
current_step_logs["rationale"] = rationale
current_step_logs["tool_call"] = {"tool_name": tool_name, "tool_arguments": arguments}
self.logs[-1]["rationale"] = rationale
self.logs[-1]["tool_call"] = {"tool_name": tool_name, "tool_arguments": arguments}
# Execute
self.logger.warning(f"Calling tool: '{tool_name}' with arguments: {arguments}")
@ -783,8 +721,7 @@ class ReactJsonAgent(ReactAgent):
answer = arguments
if answer in self.state: # if the answer is a state variable, return the value
answer = self.state[answer]
current_step_logs["final_answer"] = answer
return current_step_logs
return answer
else:
observation = self.execute_tool_call(tool_name, arguments)
observation_type = type(observation)
@ -803,8 +740,8 @@ class ReactJsonAgent(ReactAgent):
updated_information = f"Stored '{observation_name}' in memory."
self.logger.info(updated_information)
current_step_logs["observation"] = updated_information
return current_step_logs
self.logs[-1]["observation"] = updated_information
return None
class ReactCodeAgent(ReactAgent):
@ -820,7 +757,6 @@ class ReactCodeAgent(ReactAgent):
llm_engine: Callable = HfEngine(),
system_prompt: str = DEFAULT_REACT_CODE_SYSTEM_PROMPT,
tool_description_template: str = DEFAULT_TOOL_DESCRIPTION_TEMPLATE,
additional_authorized_imports: List[str] = [],
**kwargs,
):
super().__init__(
@ -839,7 +775,6 @@ class ReactCodeAgent(ReactAgent):
)
self.python_evaluator = evaluate_python_code
self.additional_authorized_imports = additional_authorized_imports
def step(self):
"""
@ -847,27 +782,26 @@ class ReactCodeAgent(ReactAgent):
The errors are raised here, they are caught and logged in the run() method.
"""
agent_memory = self.write_inner_memory_from_logs()
self.logs[-1]["agent_memory"] = agent_memory.copy()
self.prompt = agent_memory.copy()
self.logger.debug("===== New step =====")
# Add new step in logs
current_step_logs = {}
self.logs.append(current_step_logs)
current_step_logs["agent_memory"] = agent_memory.copy()
self.logs.append({})
self.logger.info("===== Calling LLM with these last messages: =====")
self.logger.info(self.prompt[-2:])
try:
llm_output = self.llm_engine(self.prompt, stop_sequences=["<end_action>", "Observation:"])
llm_output = self.llm_engine(self.prompt, stop_sequences=["<end_code>", "Observation:"])
except Exception as e:
raise AgentGenerationError(f"Error in generating llm output: {e}.")
self.logger.debug("===== Output message of the LLM: =====")
self.logger.debug(llm_output)
current_step_logs["llm_output"] = llm_output
self.logs[-1]["llm_output"] = llm_output
# Parse
self.logger.debug("===== Extracting action =====")
@ -879,23 +813,18 @@ class ReactCodeAgent(ReactAgent):
error_msg = f"Error in code parsing: {e}. Make sure to provide correct code"
raise AgentParsingError(error_msg)
current_step_logs["rationale"] = rationale
current_step_logs["tool_call"] = {"tool_name": "code interpreter", "tool_arguments": code_action}
self.logs[-1]["rationale"] = rationale
self.logs[-1]["tool_call"] = {"tool_name": "code interpreter", "tool_arguments": code_action}
# Execute
self.log_code_action(code_action)
try:
available_tools = {**BASE_PYTHON_TOOLS.copy(), **self.toolbox.tools}
result = self.python_evaluator(
code_action,
available_tools,
state=self.state,
authorized_imports=LIST_SAFE_MODULES + self.additional_authorized_imports,
)
result = self.python_evaluator(code_action, available_tools, state=self.state)
information = self.state["print_outputs"]
self.logger.warning("Print outputs:")
self.logger.log(32, information)
current_step_logs["observation"] = information
self.logs[-1]["observation"] = information
except Exception as e:
error_msg = f"Failed while trying to execute the code below:\n{CustomFormatter.reset + code_action + CustomFormatter.reset}\nThis failed due to the following error:\n{str(e)}"
if "'dict' object has no attribute 'read'" in str(e):
@ -905,5 +834,5 @@ class ReactCodeAgent(ReactAgent):
if line[: len("final_answer")] == "final_answer":
self.logger.warning(">>> Final answer:")
self.logger.log(32, result)
current_step_logs["final_answer"] = result
return current_step_logs
return result
return None

View File

@ -61,6 +61,7 @@ def get_clean_message_list(message_list: List[Dict[str, str]], role_conversions:
llama_role_conversions = {
MessageRole.SYSTEM: MessageRole.USER,
MessageRole.TOOL_RESPONSE: MessageRole.USER,
}
@ -71,14 +72,20 @@ class HfEngine:
self.client = InferenceClient(model=self.model, timeout=120)
def __call__(self, messages: List[Dict[str, str]], stop_sequences=[]) -> str:
if "Meta-Llama-3" in self.model:
if "<|eot_id|>" not in stop_sequences:
stop_sequences.append("<|eot_id|>")
if "!!!!!" not in stop_sequences:
stop_sequences.append("!!!!!")
# Get clean message list
messages = get_clean_message_list(messages, role_conversions=llama_role_conversions)
# Get LLM output
# Get answer
response = self.client.chat_completion(messages, stop=stop_sequences, max_tokens=1500)
response = response.choices[0].message.content
# Remove stop sequences from LLM output
# Remove stop sequences from the answer
for stop_seq in stop_sequences:
if response[-len(stop_seq) :] == stop_seq:
response = response[: -len(stop_seq)]

View File

@ -68,7 +68,7 @@ translated_question = translator(question=question, src_lang="French", tgt_lang=
print(f"The translated question is {translated_question}.")
answer = image_qa(image=image, question=translated_question)
print(f"The answer is {answer}")
```<end_action>
```<end_code>
---
Task: "Identify the oldest person in the `document` and create an image showcasing the result."
@ -79,7 +79,7 @@ Code:
answer = document_qa(document, question="What is the oldest person?")
print(f"The answer is {answer}.")
image = image_generator(answer)
```<end_action>
```<end_code>
---
Task: "Generate an image using the text given in the variable `caption`."
@ -88,7 +88,7 @@ I will use the following tool: `image_generator` to generate an image.
Code:
```py
image = image_generator(prompt=caption)
```<end_action>
```<end_code>
---
Task: "Summarize the text given in the variable `text` and read it out loud."
@ -99,7 +99,7 @@ Code:
summarized_text = summarizer(text)
print(f"Summary: {summarized_text}")
audio_summary = text_reader(summarized_text)
```<end_action>
```<end_code>
---
Task: "Answer the question in the variable `question` about the text in the variable `text`. Use the answer to generate an image."
@ -110,7 +110,7 @@ Code:
answer = text_qa(text=text, question=question)
print(f"The answer is {answer}.")
image = image_generator(answer)
```<end_action>
```<end_code>
---
Task: "Caption the following `image`."
@ -119,32 +119,39 @@ I will use the following tool: `image_captioner` to generate a caption for the i
Code:
```py
caption = image_captioner(image)
```<end_action>
```<end_code>
---
Above example were using tools that might not exist for you. You only have acces to those Tools:
<<tool_names>>
Remember to make sure that variables you use are all defined.
Be sure to provide a 'Code:\n```' sequence before the code and '```<end_action>' after, else you will get an error.
Be sure to provide a 'Code:\n```' sequence before the code and '```<end_code>' after, else you will get an error.
DO NOT pass the arguments as a dict as in 'answer = ask_search_agent({'query': "What is the place where James Bond lives?"})', but use the arguments directly as in 'answer = ask_search_agent(query="What is the place where James Bond lives?")'.
Now Begin! If you solve the task correctly, you will receive a reward of $1,000,000.
Now Begin!
"""
DEFAULT_REACT_JSON_SYSTEM_PROMPT = """You will be given a task to solve as best you can. To do so, you have been given access to the following tools: <<tool_names>>
The way you use the tools is by specifying a json blob, ending with '<end_action>'.
Specifically, this json should have an `action` key (name of the tool to use) and an `action_input` key (input to the tool).
DEFAULT_REACT_JSON_SYSTEM_PROMPT = """You will be given a task to solve as best you can. You have access to the following tools:
<<tool_descriptions>>
The way you use the tools is by specifying a json blob.
Specifically, this json should have a `action` key (name of the tool to use) and a `action_input` key (input to the tool).
The $ACTION_JSON_BLOB should only contain a SINGLE action, do NOT return a list of multiple actions. It should be formatted in json. Do not try to escape special characters. Here is the template of a valid $ACTION_JSON_BLOB:
Action:
{
"action": $TOOL_NAME,
"action_input": $INPUT
}<end_action>
}
Make sure to have the $INPUT as a dictionnary in the right format for the tool you are using, and do not put variable names as input if you can find the right values.
You will be given:
Task: the task you are given.
You should ALWAYS use the following format:
Thought: you should always think about one action to take. Then use the action as follows:
@ -164,14 +171,14 @@ Action:
{
"action": "image_transformer",
"action_input": {"image": "image_1.jpg"}
}<end_action>
}
To provide the final answer to the task, use an action blob with "action": "final_answer" tool. It is the only way to complete the task, else you will be stuck on a loop. So your final output should look like this:
Action:
{
"action": "final_answer",
"action_input": {"answer": "insert your final answer here"}
}<end_action>
}
Here are a few examples using notional tools:
@ -183,7 +190,7 @@ Action:
{
"action": "document_qa",
"action_input": {"document": "document.pdf", "question": "Who is the oldest person mentioned?"}
}<end_action>
}
Observation: "The oldest person in the document is John Doe, a 55 year old lumberjack living in Newfoundland."
@ -192,7 +199,7 @@ Action:
{
"action": "image_generator",
"action_input": {"text": ""A portrait of John Doe, a 55-year-old man living in Canada.""}
}<end_action>
}
Observation: "image.png"
Thought: I will now return the generated image.
@ -200,7 +207,7 @@ Action:
{
"action": "final_answer",
"action_input": "image.png"
}<end_action>
}
---
Task: "What is the result of the following operation: 5 + 3 + 1294.678?"
@ -210,7 +217,7 @@ Action:
{
"action": "python_interpreter",
"action_input": {"code": "5 + 3 + 1294.678"}
}<end_action>
}
Observation: 1302.678
Thought: Now that I know the result, I will now return it.
@ -218,7 +225,7 @@ Action:
{
"action": "final_answer",
"action_input": "1302.678"
}<end_action>
}
---
Task: "Which city has the highest population , Guangzhou or Shanghai?"
@ -228,7 +235,7 @@ Action:
{
"action": "search",
"action_input": "Population Guangzhou"
}<end_action>
}
Observation: ['Guangzhou has a population of 15 million inhabitants as of 2021.']
@ -245,30 +252,28 @@ Action:
{
"action": "final_answer",
"action_input": "Shanghai"
}<end_action>
}
Above example were using notional tools that might not exist for you. You only have acces to those tools:
<<tool_descriptions>>
<<tool_names>>
ALWAYS provide a 'Thought:' and an 'Action:' sequence. You MUST provide at least the 'Action:' sequence to move forward.
Here are the rules you should always follow to solve your task:
1. ALWAYS provide a 'Thought:' sequence, and an 'Action:' sequence that ends with <end_action>, else you will fail.
2. Always use the right arguments for the tools. Never use variable names in the 'action_input' field, use the value instead.
3. Call a tool only when needed: do not call the search agent if you do not need information, try to solve the task yourself.
4. Never re-do a tool call that you previously did with the exact same parameters.
Now Begin! If you solve the task correctly, you will receive a reward of $1,000,000.
Now begin!
"""
DEFAULT_REACT_CODE_SYSTEM_PROMPT = """You will be given a task to solve as best you can.
To do so, you have been given access to *tools*: these tools are basically Python functions which you can call with code.
You have access to the following tools:
<<tool_descriptions>>
To solve the task, you must plan forward to proceed in a series of steps, in a cycle of 'Thought:', 'Code:', and 'Observation:' sequences.
At each step, in the 'Thought:' sequence, you should first explain your reasoning towards solving the task and the tools that you want to use.
Then in the 'Code:' sequence, you should write the code in simple Python. The code sequence must end with '<end_action>' sequence.
At each step, in the 'Thought:' sequence, you should first explain your reasoning towards solving the task, then the tools that you want to use.
Then in the 'Code:' sequence, you shold write the code in simple Python. The code sequence must end with '/End code' sequence.
During each intermediate step, you can use 'print()' to save whatever important information you will then need.
These print outputs will then appear in the 'Observation:' field, which will be available as input for the next step.
These print outputs will then be available in the 'Observation:' field, for using this information as input for the next step.
In the end you have to return a final answer using the `final_answer` tool.
Here are a few examples using notional tools:
@ -280,7 +285,7 @@ Code:
```py
answer = document_qa(document=document, question="Who is the oldest person mentioned?")
print(answer)
```<end_action>
```<end_code>
Observation: "The oldest person in the document is John Doe, a 55 year old lumberjack living in Newfoundland."
Thought: I will now generate an image showcasing the oldest person.
@ -289,7 +294,7 @@ Code:
```py
image = image_generator("A portrait of John Doe, a 55-year-old man living in Canada.")
final_answer(image)
```<end_action>
```<end_code>
---
Task: "What is the result of the following operation: 5 + 3 + 1294.678?"
@ -300,10 +305,10 @@ Code:
```py
result = 5 + 3 + 1294.678
final_answer(result)
```<end_action>
```<end_code>
---
Task: "Which city has the highest population: Guangzhou or Shanghai?"
Task: "Which city has the highest population , Guangzhou or Shanghai?"
Thought: I need to get the populations for both cities and compare them: I will use the tool `search` to get the population of both cities.
Code:
@ -312,7 +317,7 @@ population_guangzhou = search("Guangzhou population")
print("Population Guangzhou:", population_guangzhou)
population_shanghai = search("Shanghai population")
print("Population Shanghai:", population_shanghai)
```<end_action>
```<end_code>
Observation:
Population Guangzhou: ['Guangzhou has a population of 15 million inhabitants as of 2021.']
Population Shanghai: '26 million (2019)'
@ -321,7 +326,7 @@ Thought: Now I know that Shanghai has the highest population.
Code:
```py
final_answer("Shanghai")
```<end_action>
```<end_code>
---
Task: "What is the current age of the pope, raised to the power 0.36?"
@ -331,7 +336,7 @@ Code:
```py
pope_age = search(query="current pope age")
print("Pope age:", pope_age)
```<end_action>
```<end_code>
Observation:
Pope age: "The pope Francis is currently 85 years old."
@ -340,21 +345,20 @@ Code:
```py
pope_current_age = 85 ** 0.36
final_answer(pope_current_age)
```<end_action>
```<end_code>
Above example were using notional tools that might not exist for you. You only have acces to those tools:
<<tool_names>>
You also can perform computations in the python code you generate.
<<tool_descriptions>>
Always provide a 'Thought:' and a 'Code:\n```py' sequence ending with '```<end_code>' sequence. You MUST provide at least the 'Code:' sequence to move forward.
You also can perform computations in the Python code that you generate.
Remember to not perform too many operations in a single code block! You should split the task into intermediate code blocks.
Print results at the end of each step to save the intermediate results. Then use final_answer() to return the final result.
Here are the rules you should always follow to solve your task:
1. Always provide a 'Thought:' sequence, and a 'Code:\n```py' sequence ending with '```<end_action>' sequence, else you will fail.
2. Use only variables that you have defined!
3. Always use the right arguments for the tools. DO NOT pass the arguments as a dict as in 'answer = ask_search_agent({'query': "What is the place where James Bond lives?"})', but use the arguments directly as in 'answer = ask_search_agent(query="What is the place where James Bond lives?")'.
4. Take care to not chain too many sequential tool calls in the same code block, especially when the output format is unpredictable. For instance, a call to search has an unpredictable return format, so do not have another tool call that depends on its output in the same block: rather output results with print() to use them in the next block.
5. Call a tool only when needed, and never re-do a tool call that you previously did with the exact same parameters.
6. Don't name any new variable with the same name as a tool: for instance don't name a variable 'final_answer'.
Remember to make sure that variables you use are all defined.
DO NOT pass the arguments as a dict as in 'answer = ask_search_agent({'query': "What is the place where James Bond lives?"})', but use the arguments directly as in 'answer = ask_search_agent(query="What is the place where James Bond lives?")'.
Now Begin! If you solve the task correctly, you will receive a reward of $1,000,000.
Now Begin!
"""

View File

@ -15,10 +15,9 @@
# See the License for the specific language governing permissions and
# limitations under the License.
import ast
import builtins
import difflib
from collections.abc import Mapping
from typing import Any, Callable, Dict, List, Optional
from typing import Any, Callable, Dict, Optional
class InterpretorError(ValueError):
@ -30,25 +29,7 @@ class InterpretorError(ValueError):
pass
ERRORS = {
name: getattr(builtins, name)
for name in dir(builtins)
if isinstance(getattr(builtins, name), type) and issubclass(getattr(builtins, name), BaseException)
}
LIST_SAFE_MODULES = [
"random",
"collections",
"math",
"time",
"queue",
"itertools",
"re",
"stat",
"statistics",
"unicodedata",
]
LIST_SAFE_MODULES = ["random", "math", "time", "queue", "itertools", "re", "stat", "statistics", "unicodedata"]
class BreakException(Exception):
@ -106,62 +87,21 @@ def evaluate_while(while_loop, state, tools):
return None
def create_function(func_def, state, tools):
def new_func(*args, **kwargs):
func_state = state.copy()
arg_names = [arg.arg for arg in func_def.args.args]
for name, value in zip(arg_names, args):
func_state[name] = value
if func_def.args.vararg:
vararg_name = func_def.args.vararg.arg
func_state[vararg_name] = args
if func_def.args.kwarg:
kwarg_name = func_def.args.kwarg.arg
func_state[kwarg_name] = kwargs
def evaluate_function_def(function_def, state, tools):
def create_function(func_def, state, tools):
def new_func(*args):
new_state = state.copy()
for arg, val in zip(func_def.args.args, args):
new_state[arg.arg] = val
result = None
for node in func_def.body:
result = evaluate_ast(node, new_state, tools)
return result
# Update function state with self and __class__
if func_def.args.args and func_def.args.args[0].arg == "self":
if args:
func_state["self"] = args[0]
func_state["__class__"] = args[0].__class__
return new_func
result = None
for stmt in func_def.body:
result = evaluate_ast(stmt, func_state, tools)
return result
return new_func
def create_class(class_name, class_bases, class_body):
class_dict = {}
for key, value in class_body.items():
class_dict[key] = value
return type(class_name, tuple(class_bases), class_dict)
def evaluate_function_def(func_def, state, tools):
tools[func_def.name] = create_function(func_def, state, tools)
return tools[func_def.name]
def evaluate_class_def(class_def, state, tools):
class_name = class_def.name
bases = [evaluate_ast(base, state, tools) for base in class_def.bases]
class_dict = {}
for stmt in class_def.body:
if isinstance(stmt, ast.FunctionDef):
class_dict[stmt.name] = evaluate_function_def(stmt, state, tools)
elif isinstance(stmt, ast.Assign):
for target in stmt.targets:
class_dict[target.id] = evaluate_ast(stmt.value, state, tools)
else:
raise InterpretorError(f"Unsupported statement in class body: {stmt.__class__.__name__}")
new_class = type(class_name, tuple(bases), class_dict)
state[class_name] = new_class
return new_class
tools[function_def.name] = create_function(function_def, state, tools)
return None
def evaluate_augassign(expression: ast.AugAssign, state: Dict[str, Any], tools: Dict[str, Callable]):
@ -236,20 +176,11 @@ def evaluate_assign(assign, state, tools):
var_names = assign.targets
result = evaluate_ast(assign.value, state, tools)
if len(var_names) == 1:
target = var_names[0]
if isinstance(target, ast.Tuple):
for i, elem in enumerate(target.elts):
if isinstance(var_names[0], ast.Tuple):
for i, elem in enumerate(var_names[0].elts):
state[elem.id] = result[i]
elif isinstance(target, ast.Attribute):
obj = evaluate_ast(target.value, state, tools)
setattr(obj, target.attr, result)
elif isinstance(target, ast.Subscript):
obj = evaluate_ast(target.value, state, tools)
key = evaluate_ast(target.slice, state, tools)
obj[key] = result
else:
state[target.id] = result
state[var_names[0].id] = result
else:
if len(result) != len(var_names):
raise InterpretorError(f"Expected {len(var_names)} values but got {len(result)}.")
@ -259,64 +190,41 @@ def evaluate_assign(assign, state, tools):
def evaluate_call(call, state, tools):
if not (isinstance(call.func, ast.Attribute) or isinstance(call.func, ast.Name)):
raise InterpretorError(
f"It is not permitted to evaluate other functions than the provided tools (tried to execute {call.func})."
)
if isinstance(call.func, ast.Attribute):
obj = evaluate_ast(call.func.value, state, tools)
func_name = call.func.attr
if not hasattr(obj, func_name):
raise InterpretorError(f"Object {obj} has no attribute {func_name}")
func = getattr(obj, func_name)
args = [evaluate_ast(arg, state, tools) for arg in call.args]
kwargs = {keyword.arg: evaluate_ast(keyword.value, state, tools) for keyword in call.keywords}
return func(*args, **kwargs)
elif isinstance(call.func, ast.Name):
func_name = call.func.id
if func_name in state:
func = state[func_name]
elif func_name in tools:
func = tools[func_name]
elif func_name in ERRORS:
func = ERRORS[func_name]
else:
raise InterpretorError(
f"It is not permitted to evaluate other functions than the provided tools or imported functions (tried to execute {call.func.id})."
)
# Todo deal with args
args = [evaluate_ast(arg, state, tools) for arg in call.args]
kwargs = {keyword.arg: evaluate_ast(keyword.value, state, tools) for keyword in call.keywords}
output = func(*args, **kwargs)
args = [evaluate_ast(arg, state, tools) for arg in call.args]
kwargs = {keyword.arg: evaluate_ast(keyword.value, state, tools) for keyword in call.keywords}
# store logs of print statements
if func_name == "print":
state["print_outputs"] += output + "\n"
if isinstance(func, type) and len(func.__module__.split(".")) > 1: # Check for user-defined classes
# Instantiate the class using its constructor
obj = func.__new__(func) # Create a new instance of the class
if hasattr(obj, "__init__"): # Check if the class has an __init__ method
obj.__init__(*args, **kwargs) # Call the __init__ method correctly
return obj
return output
else:
if func_name == "super":
if not args:
if "__class__" in state and "self" in state:
return super(state["__class__"], state["self"])
else:
raise InterpretorError("super() needs at least one argument")
cls = args[0]
if not isinstance(cls, type):
raise InterpretorError("super() argument 1 must be type")
if len(args) == 1:
return super(cls)
elif len(args) == 2:
instance = args[1]
return super(cls, instance)
else:
raise InterpretorError("super() takes at most 2 arguments")
else:
if func_name == "print":
output = " ".join(map(str, args))
state["print_outputs"] += output + "\n"
return output
else: # Assume it's a callable object
output = func(*args, **kwargs)
return output
raise InterpretorError(
f"It is not permitted to evaluate other functions than the provided tools (tried to execute {call.func})."
)
def evaluate_subscript(subscript, state, tools):
@ -340,10 +248,6 @@ def evaluate_subscript(subscript, state, tools):
def evaluate_name(name, state, tools):
if name.id in state:
return state[name.id]
elif name.id in tools:
return tools[name.id]
elif name.id in ERRORS:
return ERRORS[name.id]
close_matches = difflib.get_close_matches(name.id, list(state.keys()))
if len(close_matches) > 0:
return state[close_matches[0]]
@ -403,11 +307,7 @@ def evaluate_for(for_loop, state, tools):
result = None
iterator = evaluate_ast(for_loop.iter, state, tools)
for counter in iterator:
if isinstance(for_loop.target, ast.Tuple):
for i, elem in enumerate(for_loop.target.elts):
state[elem.id] = counter[i]
else:
state[for_loop.target.id] = counter
state[for_loop.target.id] = counter
for node in for_loop.body:
try:
line_result = evaluate_ast(node, state, tools)
@ -437,56 +337,7 @@ def evaluate_listcomp(listcomp, state, tools):
return result
def evaluate_try(try_node, state, tools):
try:
for stmt in try_node.body:
evaluate_ast(stmt, state, tools)
except Exception as e:
matched = False
for handler in try_node.handlers:
if handler.type is None or isinstance(e, evaluate_ast(handler.type, state, tools)):
matched = True
if handler.name:
state[handler.name] = e
for stmt in handler.body:
evaluate_ast(stmt, state, tools)
break
if not matched:
raise e
else:
if try_node.orelse:
for stmt in try_node.orelse:
evaluate_ast(stmt, state, tools)
finally:
if try_node.finalbody:
for stmt in try_node.finalbody:
evaluate_ast(stmt, state, tools)
def evaluate_raise(raise_node, state, tools):
if raise_node.exc is not None:
exc = evaluate_ast(raise_node.exc, state, tools)
else:
exc = None
if raise_node.cause is not None:
cause = evaluate_ast(raise_node.cause, state, tools)
else:
cause = None
if exc is not None:
if cause is not None:
raise exc from cause
else:
raise exc
else:
raise InterpretorError("Re-raise is not supported without an active exception")
def evaluate_ast(
expression: ast.AST,
state: Dict[str, Any],
tools: Dict[str, Callable],
authorized_imports: List[str] = LIST_SAFE_MODULES,
):
def evaluate_ast(expression: ast.AST, state: Dict[str, Any], tools: Dict[str, Callable]):
"""
Evaluate an abstract syntax tree using the content of the variables stored in a state and only evaluating a given
set of functions.
@ -502,9 +353,6 @@ def evaluate_ast(
tools (`Dict[str, Callable]`):
The functions that may be called during the evaluation. Any call to another function will fail with an
`InterpretorError`.
authorized_imports (`List[str]`):
The list of modules that can be imported by the code. By default, only a few safe modules are allowed.
Add more at your own risk!
"""
if isinstance(expression, ast.Assign):
# Assignement -> we evaluate the assignement which should update the state
@ -611,7 +459,7 @@ def evaluate_ast(
return result
elif isinstance(expression, ast.Import):
for alias in expression.names:
if alias.name in authorized_imports:
if alias.name in LIST_SAFE_MODULES:
module = __import__(alias.name)
state[alias.asname or alias.name] = module
else:
@ -620,27 +468,19 @@ def evaluate_ast(
elif isinstance(expression, ast.While):
return evaluate_while(expression, state, tools)
elif isinstance(expression, ast.ImportFrom):
if expression.module in authorized_imports:
if expression.module in LIST_SAFE_MODULES:
module = __import__(expression.module)
for alias in expression.names:
state[alias.asname or alias.name] = getattr(module, alias.name)
else:
raise InterpretorError(f"Import from {expression.module} is not allowed.")
return None
elif isinstance(expression, ast.ClassDef):
return evaluate_class_def(expression, state, tools)
elif isinstance(expression, ast.Try):
return evaluate_try(expression, state, tools)
elif isinstance(expression, ast.Raise):
return evaluate_raise(expression, state, tools)
else:
# For now we refuse anything else. Let's add things as we need them.
raise InterpretorError(f"{expression.__class__.__name__} is not supported.")
def evaluate_python_code(
code: str, tools: Optional[Dict[str, Callable]] = {}, state=None, authorized_imports: List[str] = LIST_SAFE_MODULES
):
def evaluate_python_code(code: str, tools: Optional[Dict[str, Callable]] = {}, state=None):
"""
Evaluate a python expression using the content of the variables stored in a state and only evaluating a given set
of functions.
@ -666,10 +506,9 @@ def evaluate_python_code(
state = {}
result = None
state["print_outputs"] = ""
for idx, node in enumerate(expression.body):
try:
line_result = evaluate_ast(node, state, tools, authorized_imports)
line_result = evaluate_ast(node, state, tools)
except InterpretorError as e:
msg = f"You tried to execute the following code:\n{code}\n"
msg += f"You got these outputs:\n{state['print_outputs']}\n"

View File

@ -185,7 +185,7 @@ class Tool:
"tool_class": full_name,
"description": self.description,
"name": self.name,
"inputs": self.inputs,
"inputs": str(self.inputs),
"output_type": str(self.output_type),
}
with open(config_file, "w", encoding="utf-8") as f:
@ -315,7 +315,7 @@ class Tool:
if tool_class.output_type != custom_tool["output_type"]:
tool_class.output_type = custom_tool["output_type"]
return tool_class(**kwargs)
return tool_class(model_repo_id, token=token, **kwargs)
def push_to_hub(
self,

View File

@ -1,21 +1,17 @@
import copy
import importlib.metadata
import json
import os
from dataclasses import dataclass
from typing import Any, Dict, List, Optional, Tuple, Union
import torch
from packaging import version
from .configuration_utils import PretrainedConfig
from .utils import is_hqq_available, is_quanto_available, logging
if is_quanto_available():
quanto_version = version.parse(importlib.metadata.version("quanto"))
if quanto_version >= version.parse("0.2.0"):
from quanto import AffineQuantizer, MaxOptimizer, qint2, qint4
from quanto import QBitsTensor, qint2, qint4
if is_hqq_available():
from hqq.core.quantize import Quantizer as HQQQuantizer
@ -492,13 +488,6 @@ class QuantoQuantizedCache(QuantizedCache):
def __init__(self, cache_config: CacheConfig) -> None:
super().__init__(cache_config)
quanto_version = version.parse(importlib.metadata.version("quanto"))
if quanto_version < version.parse("0.2.0"):
raise ImportError(
f"You need quanto package version to be greater or equal than 0.2.0 to use `QuantoQuantizedCache`. Detected version {quanto_version}. "
f"Please upgrade quanto with `pip install -U quanto`"
)
if self.nbits not in [2, 4]:
raise ValueError(f"`nbits` for `quanto` backend has to be one of [`2`, `4`] but got {self.nbits}")
@ -511,11 +500,9 @@ class QuantoQuantizedCache(QuantizedCache):
)
self.qtype = qint4 if self.nbits == 4 else qint2
self.optimizer = MaxOptimizer() # hardcode as it's the only one for per-channel quantization
def _quantize(self, tensor, axis):
scale, zeropoint = self.optimizer(tensor, self.qtype.bits, axis, self.q_group_size)
qtensor = AffineQuantizer.apply(tensor, self.qtype, axis, self.q_group_size, scale, zeropoint)
qtensor = QBitsTensor.quantize(tensor, axis=axis, qtype=self.qtype, group_size=self.q_group_size)
return qtensor
def _dequantize(self, qtensor):

View File

@ -26,7 +26,6 @@ from ..utils import (
is_safetensors_available,
is_tf_available,
is_torch_available,
is_torch_npu_available,
)
from . import BaseTransformersCLICommand
@ -89,7 +88,6 @@ class EnvironmentCommand(BaseTransformersCLICommand):
pt_version = torch.__version__
pt_cuda_available = torch.cuda.is_available()
pt_npu_available = is_torch_npu_available()
tf_version = "not installed"
tf_cuda_available = "NA"
@ -131,16 +129,9 @@ class EnvironmentCommand(BaseTransformersCLICommand):
"Flax version (CPU?/GPU?/TPU?)": f"{flax_version} ({jax_backend})",
"Jax version": f"{jax_version}",
"JaxLib version": f"{jaxlib_version}",
"Using GPU in script?": "<fill in>",
"Using distributed or parallel set-up in script?": "<fill in>",
}
if is_torch_available():
if pt_cuda_available:
info["Using GPU in script?"] = "<fill in>"
info["GPU type"] = torch.cuda.get_device_name()
elif pt_npu_available:
info["Using NPU in script?"] = "<fill in>"
info["NPU type"] = torch.npu.get_device_name()
info["CANN version"] = torch.version.cann
print("\nCopy-and-paste the text below in your GitHub issue and FILL OUT the two last points.\n")
print(self.format_dict(info))

View File

@ -536,9 +536,9 @@ class PretrainedConfig(PushToHubMixin):
force_download (`bool`, *optional*, defaults to `False`):
Whether or not to force to (re-)download the configuration files and override the cached versions if
they exist.
resume_download:
Deprecated and ignored. All downloads are now resumed by default when possible.
Will be removed in v5 of Transformers.
resume_download (`bool`, *optional*, defaults to `False`):
Whether or not to delete incompletely received file. Attempts to resume the download if such a file
exists.
proxies (`Dict[str, str]`, *optional*):
A dictionary of proxy servers to use by protocol or endpoint, e.g., `{'http': 'foo.bar:3128',
'http://hostname': 'foo.bar:4012'}.` The proxies are used on each request.

View File

@ -198,10 +198,7 @@ def get_class_in_module(class_name: str, module_path: Union[str, os.PathLike]) -
Returns:
`typing.Type`: The class looked for.
"""
name = os.path.normpath(module_path)
if name.endswith(".py"):
name = name[:-3]
name = name.replace(os.path.sep, ".")
name = os.path.normpath(module_path).rstrip(".py").replace(os.path.sep, ".")
module_spec = importlib.util.spec_from_file_location(name, location=Path(HF_MODULES_CACHE) / module_path)
module = sys.modules.get(name)
if module is None:

View File

@ -823,8 +823,6 @@ class FlaxPreTrainedModel(PushToHubMixin, FlaxGenerationMixin):
"revision": revision,
"proxies": proxies,
"token": token,
"cache_dir": cache_dir,
"local_files_only": local_files_only,
}
if has_file(pretrained_model_name_or_path, SAFE_WEIGHTS_INDEX_NAME, **has_file_kwargs):
is_sharded = True

View File

@ -2864,8 +2864,6 @@ class TFPreTrainedModel(keras.Model, TFModelUtilsMixin, TFGenerationMixin, PushT
"revision": revision,
"proxies": proxies,
"token": token,
"cache_dir": cache_dir,
"local_files_only": local_files_only,
}
if has_file(pretrained_model_name_or_path, SAFE_WEIGHTS_INDEX_NAME, **has_file_kwargs):
is_sharded = True

View File

@ -3048,9 +3048,6 @@ class PreTrainedModel(nn.Module, ModuleUtilsMixin, GenerationMixin, PushToHubMix
" ignored."
)
if gguf_file is not None and not is_accelerate_available():
raise ValueError("accelerate is required when loading a GGUF file `pip install accelerate`.")
if commit_hash is None:
if not isinstance(config, PretrainedConfig):
# We make a call to the config file first (which may be absent) to get the commit hash as soon as possible
@ -3395,75 +3392,70 @@ class PreTrainedModel(nn.Module, ModuleUtilsMixin, GenerationMixin, PushToHubMix
)
if resolved_archive_file is not None:
is_sharded = True
if not local_files_only and not is_offline_mode():
if resolved_archive_file is not None:
if filename in [WEIGHTS_NAME, WEIGHTS_INDEX_NAME]:
# If the PyTorch file was found, check if there is a safetensors file on the repository
# If there is no safetensors file on the repositories, start an auto conversion
safe_weights_name = SAFE_WEIGHTS_INDEX_NAME if is_sharded else SAFE_WEIGHTS_NAME
has_file_kwargs = {
"revision": revision,
"proxies": proxies,
"token": token,
"cache_dir": cache_dir,
"local_files_only": local_files_only,
}
cached_file_kwargs = {
"cache_dir": cache_dir,
"force_download": force_download,
"resume_download": resume_download,
"local_files_only": local_files_only,
"user_agent": user_agent,
"subfolder": subfolder,
"_raise_exceptions_for_gated_repo": False,
"_raise_exceptions_for_missing_entries": False,
"_commit_hash": commit_hash,
**has_file_kwargs,
}
if not has_file(pretrained_model_name_or_path, safe_weights_name, **has_file_kwargs):
Thread(
target=auto_conversion,
args=(pretrained_model_name_or_path,),
kwargs={"ignore_errors_during_conversion": True, **cached_file_kwargs},
name="Thread-autoconversion",
).start()
else:
# Otherwise, no PyTorch file was found, maybe there is a TF or Flax model file.
# We try those to give a helpful error message.
if not local_files_only and resolved_archive_file is not None:
if filename in [WEIGHTS_NAME, WEIGHTS_INDEX_NAME]:
# If the PyTorch file was found, check if there is a safetensors file on the repository
# If there is no safetensors file on the repositories, start an auto conversion
safe_weights_name = SAFE_WEIGHTS_INDEX_NAME if is_sharded else SAFE_WEIGHTS_NAME
has_file_kwargs = {
"revision": revision,
"proxies": proxies,
"token": token,
"cache_dir": cache_dir,
"local_files_only": local_files_only,
}
if has_file(pretrained_model_name_or_path, TF2_WEIGHTS_NAME, **has_file_kwargs):
raise EnvironmentError(
f"{pretrained_model_name_or_path} does not appear to have a file named"
f" {_add_variant(WEIGHTS_NAME, variant)} but there is a file for TensorFlow weights."
" Use `from_tf=True` to load this model from those weights."
)
elif has_file(pretrained_model_name_or_path, FLAX_WEIGHTS_NAME, **has_file_kwargs):
raise EnvironmentError(
f"{pretrained_model_name_or_path} does not appear to have a file named"
f" {_add_variant(WEIGHTS_NAME, variant)} but there is a file for Flax weights. Use"
" `from_flax=True` to load this model from those weights."
)
elif variant is not None and has_file(
pretrained_model_name_or_path, WEIGHTS_NAME, **has_file_kwargs
):
raise EnvironmentError(
f"{pretrained_model_name_or_path} does not appear to have a file named"
f" {_add_variant(WEIGHTS_NAME, variant)} but there is a file without the variant"
f" {variant}. Use `variant=None` to load this model from those weights."
)
else:
raise EnvironmentError(
f"{pretrained_model_name_or_path} does not appear to have a file named"
f" {_add_variant(WEIGHTS_NAME, variant)}, {_add_variant(SAFE_WEIGHTS_NAME, variant)},"
f" {TF2_WEIGHTS_NAME}, {TF_WEIGHTS_NAME} or {FLAX_WEIGHTS_NAME}."
)
cached_file_kwargs = {
"cache_dir": cache_dir,
"force_download": force_download,
"resume_download": resume_download,
"local_files_only": local_files_only,
"user_agent": user_agent,
"subfolder": subfolder,
"_raise_exceptions_for_gated_repo": False,
"_raise_exceptions_for_missing_entries": False,
"_commit_hash": commit_hash,
**has_file_kwargs,
}
if not has_file(pretrained_model_name_or_path, safe_weights_name, **has_file_kwargs):
Thread(
target=auto_conversion,
args=(pretrained_model_name_or_path,),
kwargs={"ignore_errors_during_conversion": True, **cached_file_kwargs},
name="Thread-autoconversion",
).start()
else:
# Otherwise, no PyTorch file was found, maybe there is a TF or Flax model file.
# We try those to give a helpful error message.
has_file_kwargs = {
"revision": revision,
"proxies": proxies,
"token": token,
}
if has_file(pretrained_model_name_or_path, TF2_WEIGHTS_NAME, **has_file_kwargs):
raise EnvironmentError(
f"{pretrained_model_name_or_path} does not appear to have a file named"
f" {_add_variant(WEIGHTS_NAME, variant)} but there is a file for TensorFlow weights."
" Use `from_tf=True` to load this model from those weights."
)
elif has_file(pretrained_model_name_or_path, FLAX_WEIGHTS_NAME, **has_file_kwargs):
raise EnvironmentError(
f"{pretrained_model_name_or_path} does not appear to have a file named"
f" {_add_variant(WEIGHTS_NAME, variant)} but there is a file for Flax weights. Use"
" `from_flax=True` to load this model from those weights."
)
elif variant is not None and has_file(
pretrained_model_name_or_path, WEIGHTS_NAME, **has_file_kwargs
):
raise EnvironmentError(
f"{pretrained_model_name_or_path} does not appear to have a file named"
f" {_add_variant(WEIGHTS_NAME, variant)} but there is a file without the variant"
f" {variant}. Use `variant=None` to load this model from those weights."
)
else:
raise EnvironmentError(
f"{pretrained_model_name_or_path} does not appear to have a file named"
f" {_add_variant(WEIGHTS_NAME, variant)}, {_add_variant(SAFE_WEIGHTS_NAME, variant)},"
f" {TF2_WEIGHTS_NAME}, {TF_WEIGHTS_NAME} or {FLAX_WEIGHTS_NAME}."
)
except EnvironmentError:
# Raise any environment error raise by `cached_file`. It will have a helpful error message adapted
# to the original exception.

View File

@ -67,6 +67,7 @@ from . import (
deit,
deprecated,
depth_anything,
deta,
detr,
dialogpt,
dinat,
@ -76,11 +77,13 @@ from . import (
donut,
dpr,
dpt,
efficientformer,
efficientnet,
electra,
encodec,
encoder_decoder,
ernie,
ernie_m,
esm,
falcon,
fastspeech2_conformer,
@ -101,6 +104,8 @@ from . import (
gpt_neox_japanese,
gpt_sw3,
gptj,
gptsan_japanese,
graphormer,
grounding_dino,
groupvit,
herbert,
@ -113,6 +118,7 @@ from . import (
instructblip,
jamba,
jetmoe,
jukebox,
kosmos2,
layoutlm,
layoutlmv2,
@ -136,6 +142,7 @@ from . import (
maskformer,
mbart,
mbart50,
mega,
megatron_bert,
megatron_gpt2,
mgp_str,
@ -154,6 +161,8 @@ from . import (
musicgen,
musicgen_melody,
mvp,
nat,
nezha,
nllb,
nllb_moe,
nougat,
@ -181,9 +190,11 @@ from . import (
prophetnet,
pvt,
pvt_v2,
qdqbert,
qwen2,
qwen2_moe,
rag,
realm,
recurrent_gemma,
reformer,
regnet,
@ -204,6 +215,7 @@ from . import (
siglip,
speech_encoder_decoder,
speech_to_text,
speech_to_text_2,
speecht5,
splinter,
squeezebert,
@ -222,6 +234,7 @@ from . import (
timesformer,
timm_backbone,
trocr,
tvlt,
tvp,
udop,
umt5,
@ -237,6 +250,7 @@ from . import (
vision_text_dual_encoder,
visual_bert,
vit,
vit_hybrid,
vit_mae,
vit_msn,
vitdet,
@ -253,6 +267,7 @@ from . import (
x_clip,
xglm,
xlm,
xlm_prophetnet,
xlm_roberta,
xlm_roberta_xl,
xlnet,

View File

@ -585,29 +585,14 @@ MODEL_NAMES_MAPPING = OrderedDict(
# `transfo-xl` (as in `CONFIG_MAPPING_NAMES`), we should use `transfo_xl`.
DEPRECATED_MODELS = [
"bort",
"deta",
"efficientformer",
"ernie_m",
"gptsan_japanese",
"graphormer",
"jukebox",
"mctct",
"mega",
"mmbt",
"nat",
"nezha",
"open_llama",
"qdqbert",
"realm",
"retribert",
"speech_to_text_2",
"tapex",
"trajectory_transformer",
"transfo_xl",
"tvlt",
"van",
"vit_hybrid",
"xlm_prophetnet",
]
SPECIAL_MODEL_TYPE_TO_MODULE_NAME = OrderedDict(
@ -631,11 +616,7 @@ def model_type_to_module_name(key):
"""Converts a config key to the corresponding module."""
# Special treatment
if key in SPECIAL_MODEL_TYPE_TO_MODULE_NAME:
key = SPECIAL_MODEL_TYPE_TO_MODULE_NAME[key]
if key in DEPRECATED_MODELS:
key = f"deprecated.{key}"
return key
return SPECIAL_MODEL_TYPE_TO_MODULE_NAME[key]
key = key.replace("-", "_")
if key in DEPRECATED_MODELS:

View File

@ -14,7 +14,7 @@
from typing import TYPE_CHECKING
from ....utils import OptionalDependencyNotAvailable, _LazyModule, is_torch_available, is_vision_available
from ...utils import OptionalDependencyNotAvailable, _LazyModule, is_torch_available, is_vision_available
_import_structure = {

View File

@ -14,9 +14,9 @@
# limitations under the License.
"""DETA model configuration"""
from ....configuration_utils import PretrainedConfig
from ....utils import logging
from ...auto import CONFIG_MAPPING
from ...configuration_utils import PretrainedConfig
from ...utils import logging
from ..auto import CONFIG_MAPPING
logger = logging.get_logger(__name__)

View File

@ -19,9 +19,9 @@ from typing import Any, Callable, Dict, Iterable, List, Optional, Tuple, Union
import numpy as np
from ....feature_extraction_utils import BatchFeature
from ....image_processing_utils import BaseImageProcessor, get_size_dict
from ....image_transforms import (
from ...feature_extraction_utils import BatchFeature
from ...image_processing_utils import BaseImageProcessor, get_size_dict
from ...image_transforms import (
PaddingMode,
center_to_corners_format,
corners_to_center_format,
@ -31,7 +31,7 @@ from ....image_transforms import (
rgb_to_id,
to_channel_dimension_format,
)
from ....image_utils import (
from ...image_utils import (
IMAGENET_DEFAULT_MEAN,
IMAGENET_DEFAULT_STD,
AnnotationFormat,
@ -48,7 +48,7 @@ from ....image_utils import (
validate_annotations,
validate_preprocess_arguments,
)
from ....utils import (
from ...utils import (
is_flax_available,
is_jax_tensor,
is_tf_available,
@ -59,7 +59,7 @@ from ....utils import (
is_vision_available,
logging,
)
from ....utils.generic import TensorType
from ...utils.generic import TensorType
if is_torch_available():

View File

@ -28,8 +28,8 @@ from torch import Tensor, nn
from torch.autograd import Function
from torch.autograd.function import once_differentiable
from ....activations import ACT2FN
from ....file_utils import (
from ...activations import ACT2FN
from ...file_utils import (
ModelOutput,
add_start_docstrings,
add_start_docstrings_to_model_forward,
@ -38,12 +38,12 @@ from ....file_utils import (
is_vision_available,
replace_return_docstrings,
)
from ....modeling_attn_mask_utils import _prepare_4d_attention_mask
from ....modeling_outputs import BaseModelOutput
from ....modeling_utils import PreTrainedModel
from ....pytorch_utils import meshgrid
from ....utils import is_accelerate_available, is_ninja_available, is_torchvision_available, logging, requires_backends
from ....utils.backbone_utils import load_backbone
from ...modeling_attn_mask_utils import _prepare_4d_attention_mask
from ...modeling_outputs import BaseModelOutput
from ...modeling_utils import PreTrainedModel
from ...pytorch_utils import meshgrid
from ...utils import is_accelerate_available, is_ninja_available, is_torchvision_available, logging, requires_backends
from ...utils.backbone_utils import load_backbone
from .configuration_deta import DetaConfig

View File

@ -71,6 +71,7 @@ _IMAGE_CLASS_EXPECTED_OUTPUT = "tabby, tabby cat"
@dataclass
# Copied from transformers.models.nat.modeling_nat.NatEncoderOutput with Nat->Dinat
class DinatEncoderOutput(ModelOutput):
"""
Dinat encoder's outputs, with potential hidden states and attentions.
@ -104,6 +105,7 @@ class DinatEncoderOutput(ModelOutput):
@dataclass
# Copied from transformers.models.nat.modeling_nat.NatModelOutput with Nat->Dinat
class DinatModelOutput(ModelOutput):
"""
Dinat model's outputs that also contains a pooling of the last hidden states.
@ -140,6 +142,7 @@ class DinatModelOutput(ModelOutput):
@dataclass
# Copied from transformers.models.nat.modeling_nat.NatImageClassifierOutput with Nat->Dinat
class DinatImageClassifierOutput(ModelOutput):
"""
Dinat outputs for image classification.
@ -175,6 +178,7 @@ class DinatImageClassifierOutput(ModelOutput):
reshaped_hidden_states: Optional[Tuple[torch.FloatTensor, ...]] = None
# Copied from transformers.models.nat.modeling_nat.NatEmbeddings with Nat->Dinat
class DinatEmbeddings(nn.Module):
"""
Construct the patch and position embeddings.
@ -197,6 +201,7 @@ class DinatEmbeddings(nn.Module):
return embeddings
# Copied from transformers.models.nat.modeling_nat.NatPatchEmbeddings with Nat->Dinat
class DinatPatchEmbeddings(nn.Module):
"""
This class turns `pixel_values` of shape `(batch_size, num_channels, height, width)` into the initial
@ -233,6 +238,7 @@ class DinatPatchEmbeddings(nn.Module):
return embeddings
# Copied from transformers.models.nat.modeling_nat.NatDownsampler with Nat->Dinat
class DinatDownsampler(nn.Module):
"""
Convolutional Downsampling Layer.
@ -315,6 +321,7 @@ class NeighborhoodAttention(nn.Module):
self.dropout = nn.Dropout(config.attention_probs_dropout_prob)
# Copied from transformers.models.nat.modeling_nat.NeighborhoodAttention.transpose_for_scores with Nat->Dinat
def transpose_for_scores(self, x):
new_x_shape = x.size()[:-1] + (self.num_attention_heads, self.attention_head_size)
x = x.view(new_x_shape)
@ -354,6 +361,7 @@ class NeighborhoodAttention(nn.Module):
return outputs
# Copied from transformers.models.nat.modeling_nat.NeighborhoodAttentionOutput
class NeighborhoodAttentionOutput(nn.Module):
def __init__(self, config, dim):
super().__init__()
@ -374,6 +382,7 @@ class NeighborhoodAttentionModule(nn.Module):
self.output = NeighborhoodAttentionOutput(config, dim)
self.pruned_heads = set()
# Copied from transformers.models.nat.modeling_nat.NeighborhoodAttentionModule.prune_heads
def prune_heads(self, heads):
if len(heads) == 0:
return
@ -392,6 +401,7 @@ class NeighborhoodAttentionModule(nn.Module):
self.self.all_head_size = self.self.attention_head_size * self.self.num_attention_heads
self.pruned_heads = self.pruned_heads.union(heads)
# Copied from transformers.models.nat.modeling_nat.NeighborhoodAttentionModule.forward
def forward(
self,
hidden_states: torch.Tensor,
@ -403,6 +413,7 @@ class NeighborhoodAttentionModule(nn.Module):
return outputs
# Copied from transformers.models.nat.modeling_nat.NatIntermediate with Nat->Dinat
class DinatIntermediate(nn.Module):
def __init__(self, config, dim):
super().__init__()
@ -418,6 +429,7 @@ class DinatIntermediate(nn.Module):
return hidden_states
# Copied from transformers.models.nat.modeling_nat.NatOutput with Nat->Dinat
class DinatOutput(nn.Module):
def __init__(self, config, dim):
super().__init__()
@ -527,6 +539,7 @@ class DinatStage(nn.Module):
self.pointing = False
# Copied from transformers.models.nat.modeling_nat.NatStage.forward
def forward(
self,
hidden_states: torch.Tensor,
@ -569,6 +582,7 @@ class DinatEncoder(nn.Module):
]
)
# Copied from transformers.models.nat.modeling_nat.NatEncoder.forward with Nat->Dinat
def forward(
self,
hidden_states: torch.Tensor,
@ -673,6 +687,7 @@ DINAT_INPUTS_DOCSTRING = r"""
"The bare Dinat Model transformer outputting raw hidden-states without any specific head on top.",
DINAT_START_DOCSTRING,
)
# Copied from transformers.models.nat.modeling_nat.NatModel with Nat->Dinat, NAT->DINAT
class DinatModel(DinatPreTrainedModel):
def __init__(self, config, add_pooling_layer=True):
super().__init__(config)

View File

@ -13,7 +13,7 @@
# limitations under the License.
from typing import TYPE_CHECKING
from ....utils import (
from ...utils import (
OptionalDependencyNotAvailable,
_LazyModule,
is_tf_available,

View File

@ -16,8 +16,8 @@
from typing import List
from ....configuration_utils import PretrainedConfig
from ....utils import logging
from ...configuration_utils import PretrainedConfig
from ...utils import logging
logger = logging.get_logger(__name__)

View File

@ -18,13 +18,13 @@ from typing import Dict, List, Optional, Union
import numpy as np
from ....image_processing_utils import BaseImageProcessor, BatchFeature, get_size_dict
from ....image_transforms import (
from ...image_processing_utils import BaseImageProcessor, BatchFeature, get_size_dict
from ...image_transforms import (
get_resize_output_image_size,
resize,
to_channel_dimension_format,
)
from ....image_utils import (
from ...image_utils import (
IMAGENET_DEFAULT_MEAN,
IMAGENET_DEFAULT_STD,
ChannelDimension,
@ -38,7 +38,7 @@ from ....image_utils import (
validate_kwargs,
validate_preprocess_arguments,
)
from ....utils import TensorType, logging
from ...utils import TensorType, logging
logger = logging.get_logger(__name__)

View File

@ -23,10 +23,10 @@ import torch.utils.checkpoint
from torch import nn
from torch.nn import BCEWithLogitsLoss, CrossEntropyLoss, MSELoss
from ....activations import ACT2FN
from ....modeling_outputs import BaseModelOutput, BaseModelOutputWithPooling, ImageClassifierOutput
from ....modeling_utils import PreTrainedModel
from ....utils import (
from ...activations import ACT2FN
from ...modeling_outputs import BaseModelOutput, BaseModelOutputWithPooling, ImageClassifierOutput
from ...modeling_utils import PreTrainedModel
from ...utils import (
ModelOutput,
add_code_sample_docstrings,
add_start_docstrings,

View File

@ -20,13 +20,13 @@ from typing import Optional, Tuple, Union
import tensorflow as tf
from ....activations_tf import ACT2FN
from ....modeling_tf_outputs import (
from ...activations_tf import ACT2FN
from ...modeling_tf_outputs import (
TFBaseModelOutput,
TFBaseModelOutputWithPooling,
TFImageClassifierOutput,
)
from ....modeling_tf_utils import (
from ...modeling_tf_utils import (
TFPreTrainedModel,
TFSequenceClassificationLoss,
get_initializer,
@ -34,8 +34,8 @@ from ....modeling_tf_utils import (
keras_serializable,
unpack_inputs,
)
from ....tf_utils import shape_list, stable_softmax
from ....utils import (
from ...tf_utils import shape_list, stable_softmax
from ...utils import (
ModelOutput,
add_code_sample_docstrings,
add_start_docstrings,

View File

@ -14,7 +14,7 @@
from typing import TYPE_CHECKING
# rely on isort to merge the imports
from ....utils import OptionalDependencyNotAvailable, _LazyModule, is_sentencepiece_available, is_torch_available
from ...utils import OptionalDependencyNotAvailable, _LazyModule, is_sentencepiece_available, is_torch_available
_import_structure = {

View File

@ -19,7 +19,7 @@ from __future__ import annotations
from typing import Dict
from ....configuration_utils import PretrainedConfig
from ...configuration_utils import PretrainedConfig
class ErnieMConfig(PretrainedConfig):

View File

@ -22,8 +22,8 @@ import torch.utils.checkpoint
from torch import nn, tensor
from torch.nn import BCEWithLogitsLoss, CrossEntropyLoss, MSELoss
from ....activations import ACT2FN
from ....modeling_outputs import (
from ...activations import ACT2FN
from ...modeling_outputs import (
BaseModelOutputWithPastAndCrossAttentions,
BaseModelOutputWithPoolingAndCrossAttentions,
MultipleChoiceModelOutput,
@ -31,9 +31,9 @@ from ....modeling_outputs import (
SequenceClassifierOutput,
TokenClassifierOutput,
)
from ....modeling_utils import PreTrainedModel
from ....pytorch_utils import find_pruneable_heads_and_indices, prune_linear_layer
from ....utils import add_code_sample_docstrings, add_start_docstrings, add_start_docstrings_to_model_forward, logging
from ...modeling_utils import PreTrainedModel
from ...pytorch_utils import find_pruneable_heads_and_indices, prune_linear_layer
from ...utils import add_code_sample_docstrings, add_start_docstrings, add_start_docstrings_to_model_forward, logging
from .configuration_ernie_m import ErnieMConfig

View File

@ -21,8 +21,8 @@ from typing import Any, Dict, List, Optional, Tuple
import sentencepiece as spm
from ....tokenization_utils import PreTrainedTokenizer
from ....utils import logging
from ...tokenization_utils import PreTrainedTokenizer
from ...utils import logging
logger = logging.get_logger(__name__)

View File

@ -14,7 +14,7 @@
from typing import TYPE_CHECKING
from ....utils import (
from ...utils import (
OptionalDependencyNotAvailable,
_LazyModule,
is_flax_available,

View File

@ -14,8 +14,8 @@
# limitations under the License.
"""GPTSAN-japanese model configuration"""
from ....configuration_utils import PretrainedConfig
from ....utils import logging
from ...configuration_utils import PretrainedConfig
from ...utils import logging
logger = logging.get_logger(__name__)

View File

@ -20,10 +20,10 @@ from typing import List, Optional, Tuple, Union
import torch
import torch.nn as nn
from ....activations import ACT2FN
from ....modeling_outputs import MoECausalLMOutputWithPast, MoEModelOutputWithPastAndCrossAttentions
from ....modeling_utils import PreTrainedModel
from ....utils import (
from ...activations import ACT2FN
from ...modeling_outputs import MoECausalLMOutputWithPast, MoEModelOutputWithPastAndCrossAttentions
from ...modeling_utils import PreTrainedModel
from ...utils import (
DUMMY_INPUTS,
DUMMY_MASK,
add_start_docstrings,

View File

@ -22,8 +22,8 @@ from typing import List, Optional, Tuple, Union
import numpy as np
from ....tokenization_utils import PreTrainedTokenizer
from ....tokenization_utils_base import (
from ...tokenization_utils import PreTrainedTokenizer
from ...tokenization_utils_base import (
BatchEncoding,
PreTokenizedInput,
PreTokenizedInputPair,
@ -31,7 +31,7 @@ from ....tokenization_utils_base import (
TextInputPair,
TruncationStrategy,
)
from ....utils import PaddingStrategy, logging
from ...utils import PaddingStrategy, logging
logger = logging.get_logger(__name__)

Some files were not shown because too many files have changed in this diff Show More