From 9a2995ee394b8f45fa3a5c12187e06886b77691a Mon Sep 17 00:00:00 2001 From: Patrick von Platen Date: Mon, 18 Apr 2022 16:50:13 +0200 Subject: [PATCH] [Quicktour Audio] Improve && remove ffmpeg dependency (#16723) * [Quicktour Audio] Improve && remove ffmpeg dependency * final fix * final touches --- docs/source/en/quicktour.mdx | 24 ++++++++++++++++-------- docs/source/es/quicktour.mdx | 24 ++++++++++++++++-------- 2 files changed, 32 insertions(+), 16 deletions(-) diff --git a/docs/source/en/quicktour.mdx b/docs/source/en/quicktour.mdx index 057196a781..c1b23995aa 100644 --- a/docs/source/en/quicktour.mdx +++ b/docs/source/en/quicktour.mdx @@ -118,20 +118,28 @@ Create a [`pipeline`] with the task you want to solve for and the model you want Next, load a dataset (see the 馃 Datasets [Quick Start](https://huggingface.co/docs/datasets/quickstart.html) for more details) you'd like to iterate over. For example, let's load the [MInDS-14](https://huggingface.co/datasets/PolyAI/minds14) dataset: ```py ->>> from datasets import load_dataset +>>> from datasets import load_dataset, Audio >>> dataset = load_dataset("PolyAI/minds14", name="en-US", split="train") # doctest: +IGNORE_RESULT ``` -You can pass a whole dataset pipeline: +We need to make sure that the sampling rate of the dataset matches the sampling +rate `facebook/wav2vec2-base-960h` was trained on. ```py ->>> files = dataset["path"] ->>> speech_recognizer(files[:4]) -[{'text': 'I WOULD LIKE TO SET UP A JOINT ACCOUNT WITH MY PARTNER HOW DO I PROCEED WITH DOING THAT'}, - {'text': "FONDERING HOW I'D SET UP A JOIN TO HELL T WITH MY WIFE AND WHERE THE AP MIGHT BE"}, - {'text': "I I'D LIKE TOY SET UP A JOINT ACCOUNT WITH MY PARTNER I'M NOT SEEING THE OPTION TO DO IT ON THE APSO I CALLED IN TO GET SOME HELP CAN I JUST DO IT OVER THE PHONE WITH YOU AND GIVE YOU THE INFORMATION OR SHOULD I DO IT IN THE AP AN I'M MISSING SOMETHING UQUETTE HAD PREFERRED TO JUST DO IT OVER THE PHONE OF POSSIBLE THINGS"}, - {'text': 'HOW DO I FURN A JOINA COUT'}] +>>> dataset = dataset.cast_column("audio", Audio(sampling_rate=speech_recognizer.feature_extractor.sampling_rate)) +``` + +Audio files are automatically loaded and resampled when calling the `"audio"` column. +Let's extract the raw waveform arrays of the first 4 samples and pass it as a list to the pipeline: + +```py +>>> raw_audio_waveforms = [d["array"] for d in dataset[:4]["audio"]] +>>> speech_recognizer(raw_audio_waveforms) +[{'text': 'I WOULD LIKE TO SET UP A JOINT ACCOUNT WITH MY PARTNER HOW DO I PROCEED WITH DOING THAT'}, + {'text': "FONDERING HOW I'D SET UP A JOIN TO HET WITH MY WIFE AND WHERE THE AP MIGHT BE"}, + {'text': "I I'D LIKE TOY SET UP A JOINT ACCOUNT WITH MY PARTNER I'M NOT SEEING THE OPTION TO DO IT ON THE APSO I CALLED IN TO GET SOME HELP CAN I JUST DO IT OVER THE PHONE WITH YOU AND GIVE YOU THE INFORMATION OR SHOULD I DO IT IN THE AP AND I'M MISSING SOMETHING UQUETTE HAD PREFERRED TO JUST DO IT OVER THE PHONE OF POSSIBLE THINGS"}, + {'text': 'HOW DO I TURN A JOIN A COUNT'}] ``` For a larger dataset where the inputs are big (like in speech or vision), you will want to pass along a generator instead of a list that loads all the inputs in memory. See the [pipeline documentation](./main_classes/pipelines) for more information. diff --git a/docs/source/es/quicktour.mdx b/docs/source/es/quicktour.mdx index 7b58c987b7..16a2eacd9e 100644 --- a/docs/source/es/quicktour.mdx +++ b/docs/source/es/quicktour.mdx @@ -118,19 +118,27 @@ Crea un [`pipeline`] con la tarea que deseas resolver y el modelo que quieres us A continuaci贸n, carga el dataset (ve 馃 Datasets [Quick Start](https://huggingface.co/docs/datasets/quickstart.html) para m谩s detalles) sobre el que quisieras iterar. Por ejemplo, vamos a cargar el dataset [MInDS-14](https://huggingface.co/datasets/PolyAI/minds14): ```py ->>> import datasets +>>> from datasets import load_dataset, Audio ->>> dataset = datasets.load_dataset("PolyAI/minds14", name="en-US", split="train") # doctest: +IGNORE_RESULT +>>> dataset = load_dataset("PolyAI/minds14", name="en-US", split="train") # doctest: +IGNORE_RESULT ``` -Puedes pasar un pipeline para un dataset: +Debemos asegurarnos de que la frecuencia de muestreo del conjunto de datos coincide con la frecuencia de muestreo con la que se entren贸 `facebook/wav2vec2-base-960h`. ```py ->>> files = dataset["path"] ->>> speech_recognizer(files[:4]) -[{'text': 'I WOULD LIKE TO SET UP A JOINT ACCOUNT WITH MY PARTNER HOW DO I PROCEED WITH DOING THAT'}, - {'text': "FONDERING HOW I'D SET UP A JOIN TO HELL T WITH MY WIFE AND WHERE THE AP MIGHT BE"}, - {'text': "I I'D LIKE TOY SET UP A JOINT ACCOUNT WITH MY PARTNER I'M NOT SEEING THE OPTION TO DO IT ON THE APSO I CALLED IN TO GET SOME HELP CAN I JUST DO IT OVER THE PHONE WITH YOU AND GIVE YOU THE INFORMATION OR SHOULD I DO IT IN THE AP AN I'M MISSING SOMETHING UQUETTE HAD PREFERRED TO JUST DO IT OVER THE PHONE OF POSSIBLE THINGS"}, +>>> dataset = dataset.cast_column("audio", Audio(sampling_rate=speech_recognizer.feature_extractor.sampling_rate)) +``` + +Los archivos de audio se cargan y remuestrean autom谩ticamente cuando se llama a la columna `"audio"`. +Extraigamos las matrices de forma de onda cruda de las primeras 4 muestras y pas茅mosla como una lista al pipeline: + +```py +>>> raw_audio_waveforms = [d["array"] for d in dataset[:4]["audio"]] +>>> speech_recognizer(raw_audio_waveforms) +[{'text': 'I WOULD LIKE TO SET UP A JOINT ACCOUNT WITH MY PARTNER HOW DO I PROCEED WITH DOING THAT'}, + {'text': "FONDERING HOW I'D SET UP A JOIN TO HET WITH MY WIFE AND WHERE THE AP MIGHT BE"}, + {'text': "I I'D LIKE TOY SET UP A JOINT ACCOUNT WITH MY PARTNER I'M NOT SEEING THE OPTION TO DO IT ON THE APSO I CALLED IN TO GET SOME HELP CAN I JUST DO IT OVER THE PHONE WITH YOU AND GIVE YOU THE INFORMATION OR SHOULD I DO IT IN THE AP AND I'M MISSING SOMETHING UQUETTE HAD PREFERRED TO JUST DO IT OVER THE PHONE OF POSSIBLE THINGS"}, + {'text': 'HOW DO I TURN A JOIN A COUNT'}] ``` Para un dataset m谩s grande, donde los inputs son de mayor tama帽o (como en habla/audio o visi贸n), querr谩s pasar un generador en lugar de una lista que carga todos los inputs en memoria. Ve la [documentaci贸n del pipeline](./main_classes/pipelines) para m谩s informaci贸n.