Commit Graph

21 Commits

Author SHA1 Message Date
Joshua Lochner 374c1052a7
Standardize `HF_ACCESS_TOKEN` -> `HF_TOKEN` (#431) 2023-12-06 18:33:50 +02:00
Joshua Lochner 9a8c664c2c
Documentation improvements (#299)
* Add link to optimum docs for supported architectures

Closes #288

* Refactor `SUPPORTED_MODELS` dict to include task

* Update example model id

* Update list of supported models

* Update generate_tests.py

* Remove requirement of `output_attentions` revision

* Add demo site to examples section (closes #233)

* Fix typo

* Include examples in docs index

* Update github issue templates

* Create config.yml

* Order supported models

* Cleanup

* Update 4_feature-request.yml
2023-12-06 17:01:36 +02:00
Joshua Lochner 96c5dd4ccf
Fix `text2text-generation` pipeline output inconsistency w/ python library (#384)
* Fix `text2text-generation` pipeline inconsistency

See https://huggingface.co/docs/transformers/v4.35.0/en/main_classes/pipelines#transformers.Text2TextGenerationPipeline

* Fix `text2text-generation` example in docs

* Improve text2text-generation output in docs
2023-11-09 16:08:27 +02:00
Per Harald Borgen ef27100553
Add JavaScript tutorial to the docs (#271) 2023-09-17 21:47:09 +02:00
Joshua Lochner 8253dab531
Update node and NPM versions (#294)
node -> 18+
npm -> 9+
2023-09-12 19:17:05 +02:00
Doni Rubiagatra 434dcccac7
[docs] Update minimum node version (16 -> 18) 2023-09-11 13:01:05 +02:00
Per Harald Borgen 76b8556110
Rename how-to guides to developer guides (#261) 2023-08-25 17:56:18 +02:00
Joshua Lochner 276bdd06b8
Improve pipeline docs (w/ example code) - closes #134 (#255)
* Add example code for zero shot image classification

* Add example code for text classification pipeline

* Fix links to custom usage from pipelines docs

Reported on discord https://discord.com/channels/879548962464493619/1142943169068154950/1142943169068154950

* Fix relative links

* Rename .mdx -> .md

GitHub recently changed how mdx files are displayed, breaking a lot of the formatting. So, we just use .md now (same as transformers)

* Add example code for token classification pipeline

* Add example code for fill-mask pipeline

* Add text2text and summarization pipeline examples

* Add example code for image segmentation pipeline

* Remove redundant `@extends Pipeline`

* Add example code for image-to-text pipeline

* Cleanup example code outputs

* Cleanup JSDoc

* Cleanup pipeline example code

* Update codegen example
2023-08-22 04:30:56 +02:00
Joshua Lochner d479953a62
[WIP] Add MMS and Wav2Vec2 models (Closes #209) (#220)
* Add example `wav2vec2` models

* Add support for `CTCDecoder` and `Wav2Vec2CTCTokenizer`

* Generate tokenizer.json files for wav2vec2 models

* Fix wav2vec2 custom tokenizer generation

* Implement wav2vec2 audio-speech-recognition

* Add `Wav2Vec2` as a supported architecture

* Update README.md

* Update generate_tests.py

* Ignore invalid tests

* Update supported wav2vec2 models

* Update supported_models.py

* Simplify pipeline construction

* Implement basic audio classification pipeline

* Update default topk value for audio classification pipeline

* Add example usage for the audio classification pipeline

* Move `loadAudio` to utils file

* Add audio classification unit test

* Add wav2vec2 ASR unit test

* Improve generated wav2vec2 tokenizer json

* Update supported_models.py

* Allow `added_tokens_regex` to be null

* Support exporting mms vocabs

* Supported nested vocabularies

* Update supported tasks and models

* Add warnings to ignore language and task for wav2vec2 models

Will add in future

* Mark internal methods as private

* Add typing to audio variable

* Update node-audio-processing.mdx

* Move node-audio-processing to guides

* Update table of contents

* Add example code for performing feature extraction w/ `Wav2Vec2Model`

NOTE: feature extraction of MMS models is currently broken in the python library, but it works correctly here. See
https://github.com/huggingface/transformers/issues/25485 for more info

* Refactor `Pipeline` class params

* Fix `pipeline` function

* Fix typo in `pipeline` JSDoc

* Fix second typo
2023-08-14 22:18:44 +02:00
Joshua Lochner db7d0f0f83
Tokenization improvements (#234)
* Create basic tokenizer playground app

* Default to no display when user adding large body of text

* Optimize BPE algorithm

- Use map instead of object for `bpe_ranks`
- Replace reduction in BPE algorithm with for loop
- Avoid conversions between sets and arrays

* Use for loop to avoid stack issues with `.push(...items)`

* Fix `mergeArrays` typing

* Remove unnecessary try-catch block in BPE

* Add Llama, T5, and BERT tokenizers to the playground

* Improve how BERT/T5 tokens are displayed

* Improve how token margins are displayed

* Use `Map` for cache

* Add efficient heap-based priority queue implementation

* Add more unit tests for LlamaTokenizer

Selected from https://github.com/belladoreai/llama-tokenizer-js/blob/master/llama-tokenizer.js#L381-L452

* Implement priority-queue-based BPE algorithm

* Remove old code

* Update `bpe` docstring

* Add `data-structures` page to docs

* Update JSDoc for data-structures.js

* Update data-structures.js

* Move `TokenLattice` and `CharTrie` to data-structures module

* Minor refactoring
2023-08-08 12:11:35 +02:00
Joshua Lochner 09ff83b90e
Create example next.js application (Closes #210) (#211)
* Create example next app

* Link to example app

* Update next configs

* Create tutorial for next.js application

* Update next.js tutorial

* Rename project `next` -> `next-client`

* Clone `next-server` from `next-client`

* Update next.config.js for server-side inference

* Create basic server-side next.js application

* Update example links

* Update subheading for client-side next.js app

* Update next.config.js files

* Create example Dockerfile

* Update next tutorial to include server-side inference

* Improve wording

* Update Dockerfile

* Add step to create a Dockerfile

* Update examples snippet

* Fix wording
2023-07-26 01:48:13 +02:00
Joshua Lochner 86e68bf9c0
Add support for private/gated model access (Closes #198) (#202)
* Allow user to specify HF token as an environment variable

* Add documentation for how to make authorized requests

* Improve docs
2023-07-21 17:31:37 +02:00
Joshua Lochner f112349a28
Object-detection pipeline improvements + better documentation (#189)
* Fix variable name

* Add pipeline loading options section

* Align object detection pipeline output with python library

* Update unit tests

* Update batched object detection unit test

* Relax object detection unit tests
2023-07-11 02:09:03 +02:00
Joshua Lochner 27d7ea489b
Improvements to documentation (#172)
* link to the conversion Space for maximum simplicity

* add some types to script (very optional)

* typo

* no need for trailing slash here

* Node is also a valid option

* Document how to find a compatible checkpoint on the hub

* Update README

* Fix typing

* Update docs index

---------

Co-authored-by: Julien Chaumond <julien@huggingface.co>
2023-06-29 19:32:17 +02:00
Joshua Lochner d90f58110a
Add whisper unit tests (#155)
* Only run encoder with required inputs

* Add basic whisper unit tests

* Add newline after heading for docs

* Add unit test for transcribing english with timestamps

* Add multilingual test case
2023-06-21 23:58:16 +02:00
Joshua Lochner 573012b434
[docs] Add tutorial + example app for server-side whisper (#147)
* Update typo in node tutorial

* Create node audio processing tutorial

* Point to tutorial in `read_audio` function

* Rename `.md` to `.mdx`

* Add node audio processing tutorial to table of contents

* Add link to model in tutorial

* Update error message grammar
2023-06-20 23:10:33 +02:00
Joshua Lochner 569f3f820a [docs] Add JSDoc for configs.js 2023-05-31 19:28:44 +02:00
Joshua Lochner 55d6ef41b3 [docs] Fix numbering 2023-05-29 17:49:46 +02:00
Joshua Lochner 75ec68ed8b Create example Node.js application 2023-05-17 12:34:48 +02:00
Joshua Lochner dff2b3bf56 [docs] Move snippets out of source folder 2023-05-15 10:57:59 +02:00
Joshua Lochner a73e8559b8 Add docs folder 2023-05-13 19:59:18 +02:00