From 3335724376319a0c453049d0cd883504f530ff52 Mon Sep 17 00:00:00 2001 From: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Date: Tue, 9 May 2023 20:37:57 -0400 Subject: [PATCH] Test composition (#23214) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * Remove nestedness in tool config * Really do it * Use remote tools descriptions * Work * Clean up eval * Changes * Tools * Tools * tool * Fix everything * Use last result/assign for evaluation * Prompt * Remove hardcoded selection * Evaluation for chat agents * correct some spelling * Small fixes * Change summarization model (#23172) * Fix link displayed * Update description of the tool * Fixes in chat prompt * Custom tools, custom prompt * Tool clean up * save_pretrained and push_to_hub for tool * Fix init * Tests * Fix tests * Tool save/from_hub/push_to_hub and tool->load_tool * Clean push_to_hub and add app file * Custom inference API for endpoints too * Clean up * old remote tool and new remote tool * Make a requirements * return_code adds tool creation * Avoid redundancy between global variables * Remote tools can be loaded * Tests * Text summarization tests * Quality * Properly mark tests * Test the python interpreter * And the CI shall be green. * fix loading of additional tools * Work on RemoteTool and fix tests * General clean up * Guard imports * Fix tools * docs: Fix broken link in 'How to add a model...' (#23216) fix link * Get default endpoint from the Hub * Add guide * Simplify tool config * Docs * Some fixes * Docs * Docs * Docs * Fix code returned by agent * Try this * Match args with signature in remote tool * Should fix python interpreter for Python 3.8 * Fix push_to_hub for tools * Other fixes to push_to_hub * Add API doc page * Docs * Docs * Custom tools * Pin tensorflow-probability (#23220) * Pin tensorflow-probability * [all-test] * [all-test] Fix syntax for bash * PoC for some chaining API * Text to speech * J'ai pris des libertés * Rename * Basic python interpreter * Add agents * Quality * Add translation tool * temp * GenQA + LID + S2T * Quality + word missing in translation * Add open assistance, support f-strings in evaluate * captioning + s2t fixes * Style * Refactor descriptions and remove chain * Support errors and rename OpenAssistantAgent * Add setup * Deal with typos + example of inference API * Some rename + README * Fixes * Update prompt * Unwanted change * Make sure everyone has a default * One prompt to rule them all. * SD * Description * Clean up remote tools * More remote tools * Add option to return code and update doc * Image segmentation * ControlNet * Gradio demo * Diffusers protection * Lib protection * ControlNet description * Cleanup * Style * Remove accelerate and try to be reproducible * No randomness * Male Basic optional in token * Clean description * Better prompts * Fix args eval in interpreter * Add tool wrapper * Tool on the Hub * Style post-rebase * Big refactor of descriptions, batch generation and evaluation for agents * Make problems easier - interface to debug * More problems, add python primitives * Back to one prompt * Remove dict for translation * Be consistent * Add prompts * New version of the agent * Evaluate new agents * New endpoints agents * Make all tools a dict variable * Typo * Add problems * Add to big prompt * Harmonize * Add tools * New evaluation * Add more tools * Build prompt with tools descriptions * Tools on the Hub * Let's chat! * Cleanup * Temporary bs4 safeguard * Cache agents and clean up * Blank init * Fix evaluation for agents * New format for tools on the Hub * Add method to reset state * Remove nestedness in tool config * Really do it * Use remote tools descriptions * Work * Clean up eval * Changes * Tools * Tools * tool * Fix everything * Use last result/assign for evaluation * Prompt * Remove hardcoded selection * Evaluation for chat agents * correct some spelling * Small fixes * Change summarization model (#23172) * Fix link displayed * Update description of the tool * Fixes in chat prompt * Custom tools, custom prompt * Tool clean up * save_pretrained and push_to_hub for tool * Fix init * Tests * Fix tests * Tool save/from_hub/push_to_hub and tool->load_tool * Clean push_to_hub and add app file * Custom inference API for endpoints too * Clean up * old remote tool and new remote tool * Make a requirements * return_code adds tool creation * Avoid redundancy between global variables * Remote tools can be loaded * Tests * Text summarization tests * Quality * Properly mark tests * Test the python interpreter * And the CI shall be green. * Work on RemoteTool and fix tests * fix loading of additional tools * General clean up * Guard imports * Fix tools * Get default endpoint from the Hub * Simplify tool config * Add guide * Docs * Some fixes * Docs * Docs * Fix code returned by agent * Try this * Docs * Match args with signature in remote tool * Should fix python interpreter for Python 3.8 * Fix push_to_hub for tools * Other fixes to push_to_hub * Add API doc page * Fixes * Doc fixes * Docs * Fix audio * Custom tools * Audio fix * Improve custom tools docstring * Docstrings * Trigger CI * Mode docstrings * More docstrings * Improve custom tools * Fix for remote tools * Style * Fix repo consistency * Quality * Tip * Cleanup on doc * Cleanup toc * Add disclaimer for starcoder vs openai * Remove disclaimer * Small fixed in the prompts * 4.29 * Update src/transformers/tools/agents.py Co-authored-by: Lysandre Debut * Complete documentation * Small fixes * Agent evaluation * Note about gradio-tools & LC * Clean up agents and prompt * Apply suggestions from code review Co-authored-by: Patrick von Platen * Apply suggestions from code review Co-authored-by: Patrick von Platen * Note about gradio-tools & LC * Add copyrights and address review comments * Quality * Add all language codes * Add remote tool tests * Move custom prompts to other docs * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * TTS tests * Quality --------- Co-authored-by: Lysandre Co-authored-by: Patrick von Platen Co-authored-by: Philipp Schmid <32632186+philschmid@users.noreply.github.com> Co-authored-by: Connor Henderson Co-authored-by: Lysandre Co-authored-by: Lysandre Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --- conftest.py | 1 + docs/source/en/_toctree.yml | 6 + docs/source/en/custom_tools.mdx | 503 ++++++++++++ docs/source/en/main_classes/agent.mdx | 64 ++ docs/source/en/transformers_agents.mdx | 329 ++++++++ src/transformers/__init__.py | 13 + src/transformers/dynamic_module_utils.py | 33 +- src/transformers/image_utils.py | 4 + src/transformers/testing_utils.py | 16 + src/transformers/tools/__init__.py | 73 ++ src/transformers/tools/agents.py | 489 ++++++++++++ src/transformers/tools/base.py | 722 ++++++++++++++++++ .../tools/document_question_answering.py | 80 ++ src/transformers/tools/evaluate_agent.py | 692 +++++++++++++++++ src/transformers/tools/image_captioning.py | 51 ++ .../tools/image_question_answering.py | 57 ++ src/transformers/tools/image_segmentation.py | 60 ++ src/transformers/tools/prompts.py | 186 +++++ src/transformers/tools/python_interpreter.py | 238 ++++++ src/transformers/tools/speech_to_text.py | 41 + src/transformers/tools/text_classification.py | 70 ++ .../tools/text_question_answering.py | 52 ++ src/transformers/tools/text_summarization.py | 52 ++ src/transformers/tools/text_to_speech.py | 65 ++ src/transformers/tools/translation.py | 271 +++++++ src/transformers/utils/__init__.py | 1 + src/transformers/utils/hub.py | 13 +- src/transformers/utils/import_utils.py | 15 + tests/tools/__init__.py | 0 .../tools/test_document_question_answering.py | 57 ++ tests/tools/test_image_captioning.py | 53 ++ tests/tools/test_image_question_answering.py | 53 ++ tests/tools/test_image_segmentation.py | 53 ++ tests/tools/test_python_interpreter.py | 124 +++ tests/tools/test_speech_to_text.py | 38 + tests/tools/test_text_classification.py | 43 ++ tests/tools/test_text_question_answering.py | 52 ++ tests/tools/test_text_summarization.py | 64 ++ tests/tools/test_text_to_speech.py | 54 ++ tests/tools/test_tools_common.py | 100 +++ tests/tools/test_translation.py | 53 ++ 41 files changed, 4933 insertions(+), 8 deletions(-) create mode 100644 docs/source/en/custom_tools.mdx create mode 100644 docs/source/en/main_classes/agent.mdx create mode 100644 docs/source/en/transformers_agents.mdx create mode 100644 src/transformers/tools/__init__.py create mode 100644 src/transformers/tools/agents.py create mode 100644 src/transformers/tools/base.py create mode 100644 src/transformers/tools/document_question_answering.py create mode 100644 src/transformers/tools/evaluate_agent.py create mode 100644 src/transformers/tools/image_captioning.py create mode 100644 src/transformers/tools/image_question_answering.py create mode 100644 src/transformers/tools/image_segmentation.py create mode 100644 src/transformers/tools/prompts.py create mode 100644 src/transformers/tools/python_interpreter.py create mode 100644 src/transformers/tools/speech_to_text.py create mode 100644 src/transformers/tools/text_classification.py create mode 100644 src/transformers/tools/text_question_answering.py create mode 100644 src/transformers/tools/text_summarization.py create mode 100644 src/transformers/tools/text_to_speech.py create mode 100644 src/transformers/tools/translation.py create mode 100644 tests/tools/__init__.py create mode 100644 tests/tools/test_document_question_answering.py create mode 100644 tests/tools/test_image_captioning.py create mode 100644 tests/tools/test_image_question_answering.py create mode 100644 tests/tools/test_image_segmentation.py create mode 100644 tests/tools/test_python_interpreter.py create mode 100644 tests/tools/test_speech_to_text.py create mode 100644 tests/tools/test_text_classification.py create mode 100644 tests/tools/test_text_question_answering.py create mode 100644 tests/tools/test_text_summarization.py create mode 100644 tests/tools/test_text_to_speech.py create mode 100644 tests/tools/test_tools_common.py create mode 100644 tests/tools/test_translation.py diff --git a/conftest.py b/conftest.py index 53efec7a6c..683b47705b 100644 --- a/conftest.py +++ b/conftest.py @@ -43,6 +43,7 @@ def pytest_configure(config): ) config.addinivalue_line("markers", "is_staging_test: mark test to run only in the staging environment") config.addinivalue_line("markers", "accelerate_tests: mark test that require accelerate") + config.addinivalue_line("markers", "tool_tests: mark the tool tests that are run on their specific schedule") def pytest_addoption(parser): diff --git a/docs/source/en/_toctree.yml b/docs/source/en/_toctree.yml index c92f21a934..c2c5008227 100644 --- a/docs/source/en/_toctree.yml +++ b/docs/source/en/_toctree.yml @@ -21,6 +21,8 @@ title: Set up distributed training with 🤗 Accelerate - local: model_sharing title: Share your model + - local: transformers_agents + title: Agents title: Tutorials - sections: - sections: @@ -99,6 +101,8 @@ title: Notebooks with examples - local: community title: Community resources + - local: custom_tools + title: Custom Tools - local: troubleshooting title: Troubleshoot title: Developer guides @@ -179,6 +183,8 @@ title: Conceptual guides - sections: - sections: + - local: main_classes/agent + title: Agents and Tools - local: model_doc/auto title: Auto Classes - local: main_classes/callback diff --git a/docs/source/en/custom_tools.mdx b/docs/source/en/custom_tools.mdx new file mode 100644 index 0000000000..f69a2cde90 --- /dev/null +++ b/docs/source/en/custom_tools.mdx @@ -0,0 +1,503 @@ + + +# Custom Tools and Prompts + + + +If you are not aware of what tools and agents are in the context of transformers, we recommend you read the +[Transformers Agents](transformers_agents) page first. + + + + + +Transformers Agent is an experimental API which is subject to change at any time. Results returned by the agents +can vary as the APIs or underlying models are prone to change. + + + +Creating and using custom tools and prompts is paramount to empowering the agent and having it perform new tasks. +In this guide we'll take a look at: + +- How to customize the prompt +- How to use custom tools +- How to create custom tools + +## Customizing the prompt + +As explained in [Transformers Agents](transformers_agents) agents can run in [`~Agent.run`] and [`~Agent.chat`] mode. +Both the run and chat mode underlie the same logic. The language model powering the agent is conditioned on a long prompt +and simply asked to complete the prompt by generating next tokens until the stop token is reached. +The only difference between the `run` and `chat` mode is that during the `chat` mode the prompt is extended with +previous user inputs and model generations, which seemingly gives the agent a memory and allows it to refer to +past interactions. + +Let's take a closer look into how the prompt is structured to understand how it can be best customized. +The prompt is structured broadly into four parts. + +- 1. Introduction: how the agent should behave, explanation of the concept of tools. +- 2. Description of all the tools. This is defined by a `<>` token that is dynamically replaced at runtime with the tools defined/chosen by the user. +- 3. A set of examples of tasks and their solution +- 4. Current example, and request for solution. + +To better understand each part, let's look at a shortened version of how such a prompt can look like in practice. + +``` +I will ask you to perform a task, your job is to come up with a series of simple commands in Python that will perform the task. +[...] +You can print intermediate results if it makes sense to do so. + +Tools: +- document_qa: This is a tool that answers a question about an document (pdf). It takes an input named `document` which should be the document containing the information, as well as a `question` that is the question about the document. It returns a text that contains the answer to the question. +- image_captioner: This is a tool that generates a description of an image. It takes an input named `image` which should be the image to caption, and returns a text that contains the description in English. +[...] + +Task: "Answer the question in the variable `question` about the image stored in the variable `image`. The question is in French." + +I will use the following tools: `translator` to translate the question into English and then `image_qa` to answer the question on the input image. + +Answer: +```py +translated_question = translator(question=question, src_lang="French", tgt_lang="English") +print(f"The translated question is {translated_question}.") +answer = image_qa(image=image, question=translated_question) +print(f"The answer is {answer}") +``` + +Task: "Identify the oldest person in the `document` and create an image showcasing the result as a banner." + +I will use the following tools: `document_qa` to find the oldest person in the document, then `image_generator` to generate an image according to the answer. + +Answer: +```py +answer = document_qa(document, question="What is the oldest person?") +print(f"The answer is {answer}.") +image = image_generator("A banner showing " + answer) +``` + +[...] + +Task: "Draw me a picture of rivers and lakes" + +I will use the following +``` + +The first part explains precisely how the model shall behave and what it should do. This part +most likely does not need to be customized. + +TODO(PVP) - explain better how the .description and .name influence the prompt + +### Customizing the tool descriptions + +The performance of the agent is directly linked to the prompt itself. We structure the prompt so that it works well +with what we intend for the agent to do; but for maximum customization we also offer the ability to specify a different prompt when instantiating the agent. + +### Customizing the single-execution prompt + +In order to specify a custom single-execution prompt, one would so the following: + +```py +template = """ [...] """ + +agent = HfAgent(your_endpoint, run_prompt_template=template) +``` + + + +Please make sure to have the `<>` string defined somewhere in the `template` so that the agent can be aware +of the tools it has available to it. + + + +#### Chat-execution prompt + +In order to specify a custom single-execution prompt, one would so the following: + +``` +template = """ [...] """ + +agent = HfAgent( + url_endpoint=your_endpoint, + token=your_hf_token, + chat_prompt_template=template +) +``` + + + +Please make sure to have the `<>` string defined somewhere in the `template` so that the agent can be +aware of the tools it has available to it. + + + +## Using custom tools + +In this section, we'll be leveraging two existing custom tools that are specific to image generation: + +- We replace [huggingface-tools/image-transformation](https://huggingface.co/spaces/huggingface-tools/image-transformation), + with [diffusers/controlnet-canny-tool](https://huggingface.co/spaces/diffusers/controlnet-canny-tool) + to allow for more image modifications. +- We add a new tool for image upscaling to the default toolbox: + [diffusers/latent-upscaler-tool](https://huggingface.co/spaces/diffusers/latent-upscaler-tool) replace the existing image-transformation tool. + +We'll start by loading the custom tools with the convenient [`load_tool`] function: + +```py +from transformers import load_tool + +controlnet_transformer = load_tool("diffusers/controlnet-canny-tool") +upscaler = load_tool("diffusers/latent-upscaler-tool") +``` + +Upon adding custom tools to an agent, the tools' descriptions and names are automatically +included in the agents' prompts. Thus, it is imperative that custom tools have +a well-written description and name in order for the agent to understand how to use them. +Let's take a look at the description and name of `controlnet_transformer`: + +```py +print(f"Description: '{controlnet_transformer.description}'") +print(f"Name: '{controlnet_transformer.name}'") +``` + +gives +``` +Description: 'This is a tool that transforms an image with ControlNet according to a prompt. +It takes two inputs: `image`, which should be the image to transform, and `prompt`, which should be the prompt to use to change it. It returns the modified image.' +Name: 'image_transformer' +``` + +The name and description is accurate and fits the style of the [curated set of tools](./transformers_agents#a-curated-set-of-tools). +Next, let's instantiate an agent with `controlnet_transformer` and `upscaler`: + +```py +tools = [controlnet_transformer, upscaler] +agent = HfAgent("https://api-inference.huggingface.co/models/bigcode/starcoder", additional_tools=tools) +``` + +This command should give you the following info: + +``` +image_transformer has been replaced by as provided in `additional_tools` +``` + +The set of curated tools already has a `image_transformer` tool which is hereby replaced with our custom tool. + + + +Overwriting existing tools can be beneficial if we want to use a custom tool exactly for the same task as an existing tool +because the agent is well-versed in using the specific task. Beware that the custom tool should follow the exact same API +as the overwritten tool in this case. + + + +The upscaler tool was given the name `image_upscaler` which is not yet present in the default toolbox and is therefore is simply added to the list of tools. +You can always have a look at the toolbox that is currently available to the agent via the `agent.toolbox` attribute: + +```py +print("\n".join([f"- {a}" for a in agent.toolbox.keys()])) +``` + +``` +- document_qa +- image_captioner +- image_qa +- image_segmenter +- transcriber +- summarizer +- text_classifier +- text_qa +- text_reader +- translator +- image_transformer +- text_downloader +- image_generator +- video_generator +- image_upscaler +``` + +Note how `image_upscaler` is now part of the agents' toolbox. + +Let's now try out the new tools! We will re-use the image we generated in (Transformers Agents Quickstart)[./transformers_agents#single-execution-run]. + +```py +from diffusers.utils import load_image + +image = load_image( + "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/rivers_and_lakes.png" +) +``` + + + +Let's transform the image into a beautiful winter landscape: + +```py +image = agent.run("Transform the image: 'A frozen lake and snowy forest'", image=image) +``` + +``` +==Explanation from the agent== +I will use the following tool: `image_transformer` to transform the image. + + +==Code generated by the agent== +image = image_transformer(image, prompt="A frozen lake and snowy forest") +``` + + + +The new image processing tool is based on ControlNet which is can make very strong modifications to the image. +By default the image processing tool returns an image of size 512x512 pixels. Let's see if we can upscale it. + +```py +image = agent.run("Upscale the image", image) +``` + +``` +==Explanation from the agent== +I will use the following tool: `image_upscaler` to upscale the image. + + +==Code generated by the agent== +upscaled_image = image_upscaler(image) +``` + + + +The agent automatically mapped our prompt "Upscale the image" to the just added upscaler tool purely based on the description and name of the upscaler tool +and was able to correctly run it. + +Next, let's have a look into how you can create a new custom tool. + +### Adding new tools + +In this section we show how to create a new tool that can be added to the agent. + +#### Creating a new tool + +We'll first start by creating a tool. We'll add the not-so-useful yet fun task of fetching the model on the Hugging Face +Hub with the most downloads for a given task. + +We can do that with the following code: + +```python +from huggingface_hub import list_models + +task = "text-classification" + +model = next(iter(list_models(filter=task, sort="downloads", direction=-1))) +print(model.id) +``` + +For the task `text-classification`, this returns `'facebook/bart-large-mnli'`, for `translation` it returns `'t5-base`. + +How do we convert this to a tool that the agent can leverage? All tools depend on the superclass `Tool` that holds the +main attributes necessary. We'll create a class that inherits from it: + +```python +from transformers import Tool + + +class HFModelDownloadsTool(Tool): + pass +``` + +This class has a few needs: +- An attribute `name`, which corresponds to the name of the tool itself. To be in tune with other tools which have a + performative name, we'll name it `model_download_counter`. +- An attribute `description`, which will be used to populate the prompt of the agent. +- `inputs` and `outputs` attributes. Defining this will help the python interpreter make educated choices about types, + and will allow for a gradio-demo to be spawned when we push our tool to the Hub. They're both a list of expected + values, which can be `text`, `image`, or `audio`. +- A `__call__` method which contains the inference code. This is the code we've played with above! + +Here's what our class looks like now: + +```python +from transformers import Tool +from huggingface_hub import list_models + + +class HFModelDownloadsTool(Tool): + name = "model_download_counter" + description = ( + "This is a tool that returns the most downloaded model of a given task on the Hugging Face Hub. " + "It takes the name of the category (such as text-classification, depth-estimation, etc), and " + "returns the name of the checkpoint." + ) + + inputs = ["text"] + outputs = ["text"] + + def __call__(self, task: str): + model = next(iter(list_models(filter=task, sort="downloads", direction=-1))) + return model.id +``` + +We now have our tool handy. Save it in a file and import it from your main script. Let's name this file +`model_downloads.py`, so the resulting import code looks like this: + +```python +from model_downloads import HFModelDownloadsTool + +tool = HFModelDownloadsTool() +``` + +In order to let others benefit from it and for simpler initialization, we recommend pushing it to the Hub under your +namespace. To do so, just call `push_to_hub` on the `tool` variable: + +```python +tool.push_to_hub("lysandre/hf-model-downloads") +``` + +You now have your code on the Hub! Let's take a look at the final step, which is to have the agent use it. + +#### Having the agent use the tool + +We now have our tool that lives on the Hub which can be instantiated as such: + +```python +from transformers import load_tool + +tool = load_tool("lysandre/hf-model-downloads") +``` + +In order to use it in the agent, simply pass it in the `additional_tools` parameter of the agent initialization method: + +```python +from transformers import HfAgent + +agent = HfAgent("https://api-inference.huggingface.co/models/bigcode/starcoder", additional_tools=[tool]) + +agent.run( + "Can you read out loud the name of the model that has the most downloads in the 'text-to-video' task on the Hugging Face Hub?" +) +``` +which outputs the following: +``` +==Code generated by the agent== +model = model_download_counter(task="text-to-video") +print(f"The model with the most downloads is {model}.") +audio_model = text_reader(model) + + +==Result== +The model with the most downloads is damo-vilab/text-to-video-ms-1.7b. +``` + +and generates the following audio. + +| **Audio** | +|------------------------------------------------------------------------------------------------------------------------------------------------------| +|