first add

This commit is contained in:
fanshuai 2024-04-23 14:42:19 +08:00
parent 9aade9b400
commit 5d56bb5d39
1143 changed files with 234557 additions and 42 deletions

16
.dockerignore Normal file
View File

@ -0,0 +1,16 @@
# Ignore everything:
**
# Except:
!deploy
!label_studio
!setup.py
!requirements.txt
!README.md
!wait-for-it.sh
# but ignore:
label_studio/frontend
# except
!label_studio/frontend/dist

52
.gitignore vendored Normal file
View File

@ -0,0 +1,52 @@
node_modules
*.lock
*.pyc
__pycache__
# docs
__generated__
\#$
data/
etc/
/src/
yarn-error.log
# mobile/builds
db/
logfile
env/
venv/
.venv
.vscode
# web/static
deploy/license.txt
deploy/logs/
deploy/redis-data/
deploy/dockerfiles/postgres-data
label_studio/core/media
label_studio/core/downloads
label_studio/core/export
label_studio/core/static_build
label_studio/core/version_.py
label_studio/core/core
label_studio/tests/test_data/tasks_and_annotations.json
archive
*.rtl.css
*.rtl.min.css
*.idea/*
.vscode/*
pyrightconfig.json
tags
tags.temp
mydatabase
label_studio.sqlite3
*.egg-info
tmp
.DS_Store
*.sqlite3

13
CODE_OF_CONDUCT.md Normal file
View File

@ -0,0 +1,13 @@
# Contributor Code of Conduct
As contributors and maintainers of this project, we pledge to respect all people who contribute through reporting issues, posting feature requests, updating documentation, submitting pull requests or patches, and other activities.
We are committed to making participation in this project a harassment-free experience for everyone, regardless of level of experience, gender, gender identity and expression, sexual orientation, disability, personal appearance, body size, race, ethnicity, age, or religion.
Examples of unacceptable behavior by participants include the use of sexual language or imagery, derogatory comments or personal attacks, trolling, public or private harassment, insults, or other unprofessional conduct.
Project maintainers have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct. Project maintainers who do not follow the Code of Conduct may be removed from the project team.
Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by opening an issue or contacting one or more of the project maintainers.
This Code of Conduct is adapted from the [Contributor Covenant](http:contributor-covenant.org), version 1.0.0, available at https://www.contributor-covenant.org/version/1/0/0/code-of-conduct.html

13
CONTRIBUTING.md Normal file
View File

@ -0,0 +1,13 @@
# Contributing to Label Studio
First off, thanks for taking the time to contribute!
This part of the documentation gives you a basic overview of how to help with the development of our label studio.
## Reporting Bugs
## Pull Requests
## Code of conduct
We value input from each member of the community, however we urge you to abide by [code of conduct](https://github.com/heartexlabs/label-studio/blob/master/CODE_OF_CONDUCT.md).

29
Dockerfile Normal file
View File

@ -0,0 +1,29 @@
# Building the main container
FROM ubuntu:20.04
WORKDIR /label-studio
ENV TZ=Europe/Berlin
RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone
RUN apt-get update && apt-get install -y build-essential postgresql-client python3.8 python3-pip python3.8-dev uwsgi git libxml2-dev libxslt-dev zlib1g-dev uwsgi
RUN chgrp -R 0 /var/log /var/cache /var/run /run /tmp /etc/uwsgi && \
chmod -R g+rwX /var/log /var/cache /var/run /run /tmp /etc/uwsgi
# Copy and install requirements.txt first for caching
COPY deploy/requirements.txt /label-studio
RUN pip3 install --upgrade pip
RUN pip3 install -r requirements.txt && pip install uwsgi
ENV DJANGO_SETTINGS_MODULE=core.settings.label_studio
ENV LABEL_STUDIO_BASE_DATA_DIR=/label-studio/data
COPY . /label-studio
RUN python3.8 setup.py develop
EXPOSE 8080
RUN ./deploy/prebuild_wo_frontend.sh
ENTRYPOINT ["./deploy/docker-entrypoint.sh"]
CMD bash /label-studio/deploy/start_label_studio.sh

210
LICENSE
View File

@ -1,73 +1,201 @@
Apache License Apache License
Version 2.0, January 2004 Version 2.0, January 2004
http://www.apache.org/licenses/ http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions. 1. Definitions.
"License" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document. "License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License. "Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity. "Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted by this License. "You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files. "Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types. "Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below). "Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof. "Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution." "Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work. "Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form. 2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed. 3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions: 4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or Derivative Works a copy of this License; and (a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices stating that You changed the files; and (b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and (c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License. (d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License. You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions. 5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file. 6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License. 7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages. 8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability. 9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work. APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following boilerplate notice, with the fields enclosed by brackets "[]" replaced with your own identifying information. (Don't include the brackets!) The text should be enclosed in the appropriate comment syntax for the file format. We also recommend that a file or class name and description of purpose be included on the same "printed page" as the copyright notice for easier identification within third-party archives. To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "{}"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright 2024 ci4s Copyright 2019 Heartex, Inc
Licensed under the Apache License, Version 2.0 (the "License"); Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License. you may not use this file except in compliance with the License.
You may obtain a copy of the License at You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0 http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS, distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and See the License for the specific language governing permissions and
limitations under the License. limitations under the License.

31
MANIFEST.in Normal file
View File

@ -0,0 +1,31 @@
include deploy/requirements.txt
# react LSF / react-app with dm
recursive-include label_studio/frontend/dist/lsf *
include label_studio/frontend/dist/react-app/*
recursive-include label_studio/frontend/dist/dm/ *
# html template files
recursive-include label_studio *.html
# annotation templates
recursive-include label_studio/annotation_templates *
# core
recursive-include label_studio/core/static *
recursive-include label_studio/core/static_build *
include label_studio/core/utils/schema/*.json
include label_studio/core/templatetags/*.py
include label_studio/core/version_.py
# io storages
recursive-include label_studio/io_storages *.yml
# tests
recursive-include label_studio/tests *.sh
recursive-include label_studio/tests/loadtests *.txt
recursive-include label_studio/tests/test_data *.yml
recursive-include label_studio/tests/test_suites/samples *
recursive-include label_studio/tests/test_suites *.yml
include label_studio/pytest.ini

31
Makefile Normal file
View File

@ -0,0 +1,31 @@
# Run Django dev server with Sqlite
run-dev:
DJANGO_DB=sqlite LOG_DIR=tmp DEBUG=true LOG_LEVEL=DEBUG DJANGO_SETTINGS_MODULE=core.settings.label_studio python label_studio/manage.py runserver
# Run Django dev migrations with Sqlite
migrate-dev:
DJANGO_DB=sqlite LOG_DIR=tmp DEBUG=true LOG_LEVEL=DEBUG DJANGO_SETTINGS_MODULE=core.settings.label_studio python label_studio/manage.py migrate
# Run Django dev shell environment with Sqlite
shell-dev:
DJANGO_DB=sqlite LOG_DIR=tmp DEBUG=true LOG_LEVEL=DEBUG DJANGO_SETTINGS_MODULE=core.settings.label_studio python label_studio/manage.py shell_plus
# Install modules
frontend-setup:
cd label_studio/frontend && npm ci && npm run download:all;
# Fetch DM and LSF
frontend-fetch:
cd label_studio/frontend && npm run download:all;
# Build frontend continuously on files changes
frontend-watch:
cd label_studio/frontend && npm start
# Build production-ready optimized bundle
frontend-build:
cd label_studio/frontend && npm ci && npm run build:production
# Run tests
test:
cd label_studio && pytest -v -m "not integration_tests"

4
NOTICE Normal file
View File

@ -0,0 +1,4 @@
Label Studio (TM)
Copyright (c) 2019-2021 Heartex, Inc. All Rights Reserved.
Source code in this repository is licensed under the Apache License
Version 2.0, Please see LICENSE for more information.

215
README.md
View File

@ -1,2 +1,215 @@
# label-studio <img src="https://raw.githubusercontent.com/heartexlabs/label-studio/master/images/ls_github_header.png"/>
![GitHub](https://img.shields.io/github/license/heartexlabs/label-studio?logo=heartex) ![label-studio:build](https://github.com/heartexlabs/label-studio/workflows/label-studio:build/badge.svg) ![GitHub release](https://img.shields.io/github/v/release/heartexlabs/label-studio?include_prereleases)
[Website](https://labelstud.io/) • [Docs](https://labelstud.io/guide/) • [Twitter](https://twitter.com/heartexlabs) • [Join Slack Community <img src="https://app.heartex.ai/docs/images/slack-mini.png" width="18px"/>](http://slack.labelstud.io.s3-website-us-east-1.amazonaws.com?source=github-1)
## What is Label Studio?
<a href="https://labelstud.io/blog/release-100.html"><img src="https://github.com/heartexlabs/label-studio/raw/master/docs/themes/htx/source/images/release-100/LS-Hits-v1.0.png" align="right" /></a>
Label Studio is an open source data labeling tool. It lets you label data types like audio, text, images, videos, and time series with a simple and straightforward UI and export to various model formats. It can be used to prepare raw data or improve existing training data to get more accurate ML models.
- [Try out Label Studio](#try-out-label-studio)
- [What you get from Label Studio](#what-you-get-from-label-studio)
- [Included templates for labeling data in Label Studio](#included-templates-for-labeling-data-in-label-studio)
- [Set up machine learning models with Label Studio](#set-up-machine-learning-models-with-Label-Studio)
- [Integrate Label Studio with your existing tools](#integrate-label-studio-with-your-existing-tools)
![Gif of Label Studio annotating different types of data](https://raw.githubusercontent.com/heartexlabs/label-studio/master/images/annotation_examples.gif)
Have a custom dataset? You can customize Label Studio to fit your needs. Read an [introductory blog post](https://towardsdatascience.com/introducing-label-studio-a-swiss-army-knife-of-data-labeling-140c1be92881) to learn more.
## Try out Label Studio
Try out Label Studio in a **[running app](https://app.labelstud.io)**, install it locally, or deploy it in a cloud instance.
- [Install locally with Docker](#install-locally-with-docker)
- [Run with Docker Compose (Label Studio + Nginx + PostgreSQL)](#run-with-docker-compose)
- [Install locally with pip](#install-locally-with-pip)
- [Install locally with Anaconda](#install-locally-with-anaconda)
- [Install for local development](#install-for-local-development)
- [Deploy in a cloud instance](#deploy-in-a-cloud-instance)
### Install locally with Docker
Run Label Studio in a Docker container and access it at `http://localhost:8080`.
```bash
docker run -it -p 8080:8080 -v `pwd`/mydata:/label-studio/data heartexlabs/label-studio:latest
```
You can find all the generated assets, including SQLite3 database storage `label_studio.sqlite3` and uploaded files, in the `./mydata` directory.
#### Override default Docker install
You can override the default launch command by appending the new arguments:
```bash
docker run -it -p 8080:8080 -v `pwd`/mydata:/label-studio/data heartexlabs/label-studio:latest label-studio --log-level DEBUG
```
#### Build a local image with Docker
If you want to build a local image, run:
```bash
docker build -t heartexlabs/label-studio:latest .
```
### Run with Docker Compose
Docker compose script provides production-ready stack consisting of the following components:
- Label Studio
- [Nginx](https://www.nginx.com/) - proxy web server used to load various static data, including uploaded audio, images, etc.
- [PostgreSQL](https://www.postgresql.org/) - production-ready database that replaces less performant SQLite3.
To start using the app from `http://localhost` run this command:
```bash
docker-compose up
```
### Install locally with pip
```bash
# Requires >=Python3.6, <3.9
pip install label-studio
# Start the server at http://localhost:8080
label-studio
```
### Install locally with Anaconda
```bash
conda create --name label-studio python=3.8
conda activate label-studio
pip install label-studio
```
### Install for local development
You can run the latest Label Studio version locally without installing the package with pip.
```bash
# Install all package dependencies
pip install -e .
# Run database migrations
python label_studio/manage.py migrate
# Start the server in development mode at http://localhost:8080
python label_studio/manage.py runserver
```
### Deploy in a cloud instance
You can deploy Label Studio with one click in Heroku, Microsoft Azure, or Google Cloud Platform:
[<img src="https://www.herokucdn.com/deploy/button.svg" height="30px">](https://heroku.com/deploy?template=https://github.com/heartexlabs/label-studio/tree/master)
[<img src="https://aka.ms/deploytoazurebutton" height="30px">](https://portal.azure.com/#create/Microsoft.Template/uri/https%3A%2F%2Fraw.githubusercontent.com%2Fheartexlabs%2Flabel-studio%2Fmaster%2Fazuredeploy.json)
[<img src="https://deploy.cloud.run/button.svg" height="30px">](https://deploy.cloud.run)
#### Apply frontend changes
The frontend part of Label Studio app lies in the `frontend/` folder and written in React JSX. In case you've made some changes there, the following commands should be run before building / starting the instance:
```
cd frontend/
npm ci
npx webpack
cd ..
python label_studio/manage.py collectstatic --no-input
```
### Troubleshoot installation
If you see any errors during installation, try to rerun the installation
```bash
pip install --ignore-installed label-studio
```
#### Install dependencies on Windows
To run Label Studio on Windows, download and install the following wheel packages from [Gohlke builds](https://www.lfd.uci.edu/~gohlke/pythonlibs) to ensure you're using the correct version of Python:
- [lxml](https://www.lfd.uci.edu/~gohlke/pythonlibs/#lxml)
```bash
# Upgrade pip
pip install -U pip
# If you're running Win64 with Python 3.8, install the packages downloaded from Gohlke:
pip install lxml4.5.0cp38cp38win_amd64.whl
# Install label studio
pip install label-studio
```
## What you get from Label Studio
![Screenshot of Label Studio data manager grid view with images](https://raw.githubusercontent.com/heartexlabs/label-studio/master/images/labelstudio-ui.gif)
- **Multi-user labeling** sign up and login, when you create an annotation it's tied to your account.
- **Multiple projects** to work on all your datasets in one instance.
- **Streamlined design** helps you focus on your task, not how to use the software.
- **Configurable label formats** let you customize the visual interface to meet your specific labeling needs.
- **Support for multiple data types** including images, audio, text, HTML, time-series, and video.
- **Import from files or from cloud storage** in Amazon AWS S3, Google Cloud Storage, or JSON, CSV, TSV, RAR, and ZIP archives.
- **Integration with machine learning models** so that you can visualize and compare predictions from different models and perform pre-labeling.
- **Embed it in your data pipeline** REST API makes it easy to make it a part of your pipeline
## Included templates for labeling data in Label Studio
Label Studio includes a variety of templates to help you label your data, or you can create your own using specifically designed configuration language. The most common templates and use cases for labeling include the following cases:
<img src="https://raw.githubusercontent.com/heartexlabs/label-studio/master/images/templates-categories.jpg" />
## Set up machine learning models with Label Studio
Connect your favorite machine learning model using the Label Studio Machine Learning SDK. Follow these steps:
1. Start your own machine learning backend server. See [more detailed instructions](https://github.com/heartexlabs/label-studio-ml-backend).
2. Connect Label Studio to the server on the model page found in project settings.
This lets you:
- **Pre-label** your data using model predictions.
- Do **online learning** and retrain your model while new annotations are being created.
- Do **active learning** by labeling only the most complex examples in your data.
## Integrate Label Studio with your existing tools
You can use Label Studio as an independent part of your machine learning workflow or integrate the frontend or backend into your existing tools.
* Use the [Label Studio Frontend](https://github.com/heartexlabs/label-studio-frontend) as a separate React library. See more in the [Frontend Library documentation](https://labelstud.io/guide/frontend.html).
## Ecosystem
| Project | Description |
|-|-|
| label-studio | Server, distributed as a pip package |
| [label-studio-frontend](https://github.com/heartexlabs/label-studio-frontend) | React and JavaScript frontend and can run standalone in a web browser or be embedded into your application. |
| [data-manager](https://github.com/heartexlabs/dm2) | React and JavaScript frontend for managing data. Includes the Label Studio Frontend. Relies on the label-studio server or a custom backend with the expected API methods. |
| [label-studio-converter](https://github.com/heartexlabs/label-studio-converter) | Encode labels in the format of your favorite machine learning library |
| [label-studio-transformers](https://github.com/heartexlabs/label-studio-transformers) | Transformers library connected and configured for use with Label Studio |
## Roadmap
Want to use **The Coolest Feature X** but Label Studio doesn't support it? Check out [our public roadmap](roadmap.md)!
## Citation
```tex
@misc{Label Studio,
title={{Label Studio}: Data labeling software},
url={https://github.com/heartexlabs/label-studio},
note={Open source software available from https://github.com/heartexlabs/label-studio},
author={
Maxim Tkachenko and
Mikhail Malyuk and
Nikita Shevchenko and
Andrey Holmanyuk and
Nikolai Liubimov},
year={2020-2021},
}
```
## License
This software is licensed under the [Apache 2.0 LICENSE](/LICENSE) © [Heartex](https://www.heartex.ai/). 2020-2021
<img src="https://github.com/heartexlabs/label-studio/blob/master/images/opossum_looking.png?raw=true" title="Hey everyone!" height="140" width="140" />

31
app.json Normal file
View File

@ -0,0 +1,31 @@
{
"name": "Label Studio",
"description": "Multi-type data labeling, annotation and exploration tool",
"keywords": ["data annotation", "data labeling"],
"website": "https://labelstud.io",
"repository": "https://github.com/heartexlabs/label-studio",
"logo": "https://labelstud.io/images/opossum/heartex_icon_opossum_green.svg",
"stack": "container",
"env": {
"LABEL_STUDIO_ONE_CLICK_DEPLOY": {
"description": "Label Studio One Click Deploy Environmental Flag",
"value": "1",
"required": false
},
"DISABLE_SIGNUP_WITHOUT_LINK": {
"description": "Disable signup for users without invite link",
"value": "0",
"required": false
},
"USERNAME": {
"description": "Username(email) for default user",
"value": "",
"required": false
},
"PASSWORD": {
"description": "Password for default user",
"value": "",
"required": false
}
}
}

105
azuredeploy.json Normal file
View File

@ -0,0 +1,105 @@
{
"$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#",
"contentVersion": "1.0.0.0",
"parameters": {
"hostingPlanName": {
"type": "string",
"minLength": 1,
"metadata": {
"description": "Name of the hosting plan to use in Azure."
}
},
"appName": {
"type": "string",
"minLength": 1,
"metadata": {
"description": "Name of the Azure Web app to create."
}
},
"skuName": {
"type": "string",
"defaultValue": "F1",
"allowedValues": [
"F1",
"D1",
"B1",
"B2",
"B3",
"S1",
"S2",
"S3",
"P1",
"P2",
"P3",
"P4"
],
"metadata": {
"description": "Describes plan's pricing tier and instance size. Check details at https://azure.microsoft.com/en-us/pricing/details/app-service/"
}
},
"skuCapacity": {
"type": "int",
"defaultValue": 1,
"minValue": 1,
"maxValue": 3,
"metadata": {
"description": "Describes plan's instance count"
}
},
"location": {
"type": "string",
"defaultValue": "[resourceGroup().location]",
"metadata": {
"description": "Location for all resources."
}
}
},
"resources": [
{
"apiVersion": "2018-02-01",
"name": "[parameters('hostingPlanName')]",
"type": "Microsoft.Web/serverfarms",
"location": "[parameters('location')]",
"tags": {
"displayName": "HostingPlan"
},
"sku": {
"name": "[parameters('skuName')]",
"capacity": "[parameters('skuCapacity')]"
},
"properties": {
"name": "[parameters('hostingPlanName')]"
}
},
{
"apiVersion": "2018-02-01",
"name": "[parameters('appName')]",
"type": "Microsoft.Web/sites",
"location": "[parameters('location')]",
"tags": {
"[concat('hidden-related:', resourceGroup().id, '/providers/Microsoft.Web/serverfarms/', parameters('hostingPlanName'))]": "Resource",
"displayName": "Website"
},
"dependsOn": [
"[concat('Microsoft.Web/serverfarms/', parameters('hostingPlanName'))]"
],
"properties": {
"name": "[parameters('appName')]",
"serverFarmId": "[resourceId('Microsoft.Web/serverfarms', parameters('hostingPlanName'))]"
},
"resources": [
{
"apiVersion": "2018-02-01",
"name": "web",
"type": "config",
"dependsOn": [
"[concat('Microsoft.Web/sites/', parameters('appName'))]"
],
"properties": {
"pythonVersion": "3.7"
}
}
]
}
]
}

View File

@ -0,0 +1,12 @@
{
"$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentParameters.json#",
"contentVersion": "1.0.0.0",
"parameters": {
"hostingPlanName": {
"value": "GEN-UNIQUE"
},
"appName": {
"value": "GEN-UNIQUE"
}
}
}

2
codecov.yml Normal file
View File

@ -0,0 +1,2 @@
fixes:
- "::label_studio/"

View File

@ -0,0 +1,17 @@
#!/usr/bin/env bash
set -e
# if wait-for-postgres script is specified, we sleep in the loop until Postgre is up
if [[ "$1" = "./deploy/wait-for-postgres.sh" ]]; then
# ./deploy/wait-for-postgres.sh db
$1
shift 1
fi
echo "=> Do database migrations..."
python3 label_studio/manage.py migrate
echo "=> Run $@..."
exec "$@"

View File

@ -0,0 +1,54 @@
# this file is distributed as an example of serving Label Studio under proxy
# with sub-path http://localhost/foo/ without any warranties.
# If you need to adjust it to your deployment scenario, manually modify the ./deploy/nginx/subpath.example.simple.conf,
# Label Studio env vars and postgres DB settings
version: '3.3'
services:
nginx:
image: nginx:latest
ports:
- 80:80
depends_on:
- app
volumes:
- static:/label-studio/label_studio:rw
- ./mydata:/label-studio/data:rw
- ../nginx/subpath.example.simple.conf:/etc/nginx/conf.d/default.conf:ro
command: nginx -g "daemon off;"
app:
stdin_open: true
tty: true
image: heartexlabs/label-studio:latest
ports:
- 8080:8080
depends_on:
- db
environment:
- DJANGO_DB=default
- POSTGRE_NAME=postgres
- POSTGRE_USER=postgres
- POSTGRE_PASSWORD=
- POSTGRE_PORT=5432
- POSTGRE_HOST=db
- LABEL_STUDIO_HOST=http://localhost/foo
volumes:
- ./mydata:/label-studio/data:rw
command: [ "./deploy/wait-for-postgres.sh", "db", "bash", "/label-studio/deploy/start_label_studio.sh" ]
db:
image: postgres:11.5
hostname: db
restart: always
environment:
- POSTGRES_HOST_AUTH_METHOD=trust
volumes:
- ${POSTGRES_DATA_DIR:-./postgres-data}:/var/lib/postgresql/data
ports:
- 5432:5432
volumes:
static: {}

3
deploy/heroku_run.sh Normal file
View File

@ -0,0 +1,3 @@
#!/bin/bash
label-studio --host ${HOST:-""} --port ${PORT} --username ${USERNAME} --password ${PASSWORD}

27
deploy/install_npm.sh Normal file
View File

@ -0,0 +1,27 @@
#!/bin/bash
cat /etc/os-release
lsb_release -a
echo "=> Current dir:"
echo $PWD
echo "=> Install prerequisites..."
curl -o- https://raw.githubusercontent.com/creationix/nvm/v0.33.1/install.sh | bash
source ~/.bashrc
export NVM_DIR="$HOME/.nvm"
[ -s "$NVM_DIR/nvm.sh" ] && \. "$NVM_DIR/nvm.sh" # This loads nvm
nvm install node
nvm use node
apt install unzip
echo "npm version:"
npm -v
echo "node version:"
node -v
echo "unzip version:"
unzip -v
echo "=> Installing npm packages..."
npm i node-fetch

45
deploy/nginx/default.conf Normal file
View File

@ -0,0 +1,45 @@
server {
listen 80;
server_name _;
proxy_redirect off;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
client_body_buffer_size 128k;
proxy_connect_timeout 90;
proxy_send_timeout 90;
proxy_read_timeout 300;
proxy_buffers 32 4k;
location / {
proxy_pass http://app:8080;
add_header 'Access-Control-Allow-Origin' '*';
}
location /data/upload {
alias /label-studio/data/media/upload/;
}
location /data/avatars {
alias /label-studio/data/media/avatars;
}
location /static {
alias /label-studio/label_studio/core/static_build/;
add_header 'Access-Control-Allow-Origin' '*';
}
location /label-studio-frontend {
alias /label-studio/label_studio/frontend/dist/lsf;
}
location /react-app {
alias /label-studio/label_studio/frontend/dist/react-app;
}
error_page 500 502 503 504 /50x.html;
location = /50x.html {
root /usr/share/nginx/html;
}
}

View File

@ -0,0 +1,45 @@
server {
listen 80;
server_name _;
proxy_redirect off;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
client_body_buffer_size 128k;
proxy_connect_timeout 90;
proxy_send_timeout 90;
proxy_read_timeout 300;
proxy_buffers 32 4k;
location / {
proxy_pass http://app:8080;
proxy_set_header Host $http_host;
}
location /label-studio/data/upload {
alias /label-studio/data/media/upload/;
}
location /label-studio/data/avatars {
alias /label-studio/data/media/avatars;
}
location /label-studio/static {
alias /label-studio/label_studio/core/static_build/;
add_header 'Access-Control-Allow-Origin' '*';
}
location /label-studio/label-studio-frontend {
alias /label-studio/label_studio/frontend/dist/lsf;
}
location /label-studio/react-app {
alias /label-studio/label_studio/frontend/dist/react-app;
}
error_page 500 502 503 504 /50x.html;
location = /50x.html {
root /usr/share/nginx/html;
}
}

View File

@ -0,0 +1,24 @@
server {
listen 80;
server_name _;
proxy_redirect off;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
client_body_buffer_size 128k;
proxy_connect_timeout 90;
proxy_send_timeout 90;
proxy_read_timeout 300;
proxy_buffers 32 4k;
client_max_body_size 2M;
location /foo {
rewrite /foo/(.*) /$1 break;
proxy_pass http://app:8080;
proxy_redirect off;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
}
}

17
deploy/prebuild.sh Normal file
View File

@ -0,0 +1,17 @@
#!/usr/bin/env bash
set -e
SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
echo "SCRIPT_DIR: ${SCRIPT_DIR}"
echo "=> Create production bundle..."
cd ${SCRIPT_DIR}/../label_studio/frontend
npm ci && npm run build:production
cd ${SCRIPT_DIR}
MANAGE=${SCRIPT_DIR}/../label_studio/manage.py
echo "=> Collect static..."
python3 $MANAGE collectstatic --no-input
echo "=> Create version file..."
python3 ${SCRIPT_DIR}/../label_studio/core/version.py

View File

@ -0,0 +1,12 @@
#!/usr/bin/env bash
set -e
SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
echo "SCRIPT_DIR: ${SCRIPT_DIR}"
MANAGE=${SCRIPT_DIR}/../label_studio/manage.py
echo "=> Collect static..."
python3 $MANAGE collectstatic --no-input
echo "=> Create version file..."
python3 ${SCRIPT_DIR}/../label_studio/core/version.py

View File

@ -0,0 +1,9 @@
pytest==6.2.2
pytest-cov==2.9.0
pytest-django==4.1.0
pytest-mock==1.10.3
requests-mock==1.5.2
pyyaml>=5.3.1
moto==1.3.16.dev122
tavern==1.14.0
fakeredis==1.5.0

47
deploy/requirements.txt Normal file
View File

@ -0,0 +1,47 @@
wheel
appdirs>=1.4.3
attr==0.3.1
attrs>=19.2.0
pyyaml>=5.3.1
azure-storage-blob>=12.6.0
boto==2.49.0
boto3==1.16.28
botocore==1.19.28
google-cloud-storage==1.28.1
Django==3.1.12
django_annoying==0.10.6
django_debug_toolbar==3.2
django_filter==2.4.0
django_model_utils==4.1.1
django_rq==2.3.2
django-cors-headers==3.6.0
django-extensions==3.1.0
django-rest-swagger==2.2.0
django-user-agents==0.4.0
django-ranged-fileresponse>=0.1.2
django-redis-cache==3.0.0
drf_dynamic_fields==0.3.0
drf_yasg==1.20.0
drf-generators==0.3.0
label-studio-converter==0.0.29
htmlmin==0.1.12
jsonschema==3.2.0
lockfile>=0.12.0
lxml>=4.2.5
numpy>=1.19.1
ordered_set==4.0.2
pandas>=0.24.0
protobuf>=3.15.5
psycopg2-binary==2.8.4
pydantic==1.7.3
python_dateutil==2.8.1
pytz==2019.3
requests>=2.22.0,<3
rq==1.7.0
rules==2.2
ujson>=3.0.0
xmljson==0.2.0
colorama>=0.4.4
boxing>=0.1.4
redis>=3.5.0
sentry-sdk>=1.1.0

View File

@ -0,0 +1,5 @@
#!/usr/bin/env bash
# see deploy/uwsgi.ini for details
# /usr/local/bin/uwsgi --ini /label-studio/deploy/uwsgi.ini
echo "Make simple Label Studio launch..."
label-studio

16
deploy/uwsgi.ini Normal file
View File

@ -0,0 +1,16 @@
[uwsgi]
chdir = /label-studio/label_studio
module = core.wsgi:application
master = true
processes = 4
harakiri = 1000
max-requests = 5000
vacuum = true
die-on-term = true
disable-logging = true
pidfile = /tmp/%n.pid
buffer-size = 65535
disable-write-exception = truestatic-map = /static=core/static_build
log-master = true
protocol = http
socket = 0.0.0.0:8080

View File

@ -0,0 +1,12 @@
#!/bin/sh
# wait-for-postgres.sh
set -e
until PGPASSWORD=$POSTGRE_PASSWORD psql -h "$POSTGRE_HOST" -p $POSTGRE_PORT -U "$POSTGRE_USER" -c '\q'; do
>&2 echo "Postgres is unavailable - sleeping"
sleep 1
done
>&2 echo "Postgres is up - executing command"

50
docker-compose.yml Normal file
View File

@ -0,0 +1,50 @@
version: '3.3'
services:
nginx:
image: nginx:latest
ports:
- 8080:80
depends_on:
- app
volumes:
- static:/label-studio/label_studio:rw
- ./mydata:/label-studio/data:rw
- ./deploy/nginx/${NGINX_FILE:-default.conf}:/etc/nginx/conf.d/default.conf:ro
command: nginx -g "daemon off;"
app:
stdin_open: true
tty: true
build: .
image: heartexlabs/label-studio:latest
expose:
- "8080"
depends_on:
- db
environment:
- DJANGO_DB=default
- POSTGRE_NAME=postgres
- POSTGRE_USER=postgres
- POSTGRE_PASSWORD=
- POSTGRE_PORT=5432
- POSTGRE_HOST=db
- LABEL_STUDIO_HOST=${LABEL_STUDIO_HOST:-}
volumes:
- ./mydata:/label-studio/data:rw
- static:/label-studio/label_studio:rw
command: [ "./deploy/wait-for-postgres.sh", "bash", "/label-studio/deploy/start_label_studio.sh" ]
db:
image: postgres:11.5
hostname: db
restart: always
environment:
- POSTGRES_HOST_AUTH_METHOD=trust
volumes:
- ${POSTGRES_DATA_DIR:-./postgres-data}:/var/lib/postgresql/data
volumes:
static: {}

7
docs/.gitignore vendored Normal file
View File

@ -0,0 +1,7 @@
.DS_Store
Thumbs.db
db.json
*.log
node_modules/
public/
.deploy*/

46
docs/README.md Normal file
View File

@ -0,0 +1,46 @@
# Documentation of Label Studio
## Use and deploy Hexo
### Installing Dependencies
```shell
npm install
```
### Starting Development server
```shell
npm run server
```
Starts a local server. By default, this is at http://localhost:4000/.
### Deploying Documentation
```shell
npm run publish
```
## Deploy the docs locally using Hexo
To deploy the docs locally on your machine using Hexo, use the following steps.
### Prerequisites
- Install Hexo
- Clone the Label Studio Github repository
### Deploy the docs locally
In the label-studio/docs directory of the cloned repo, do the following:
1. (First time) Install required dependencies:
```shell
npm install
```
2. Start the Hexo server:
```shell
hexo serve
```
## Hexo Official Documentation
[https://hexo.io/docs/](https://hexo.io/docs/)

109
docs/_config.yml Normal file
View File

@ -0,0 +1,109 @@
# Hexo Configuration
## Docs: https://hexo.io/docs/configuration.html
## Source: https://github.com/hexojs/hexo/
# Site
title: Label Studio
subtitle: Data labeling, annotation and exploration tool
description: Label Studio is a multi-type data labeling and annotation tool with standardized output format
keywords: computer-vision, deep-learning, image-annotation, annotation-tool, annotation, labeling, labeling-tool, image-labeling, image-labeling-tool, boundingbox, image-classification, annotations, imagenet, semantic-segmentation, dataset, datasets, label-studio, label-region, data-labeling, text-annotation
author: Heartex
language: en
timezone:
google_analytics: UA-129877673-4
# URL
## If your site is put in a subdirectory, set url as 'http://yoursite.com/child' and root as '/child/'
url: https://labelstud.io/
root: /
permalink: :year/:month/:day/:title/
permalink_defaults:
# Directory
source_dir: source
public_dir: public
tag_dir: tags
archive_dir: archives
category_dir: categories
code_dir: downloads/code
i18n_dir: :lang
skip_render:
source/playground/render.html
favicon: images/favicon.ico
# Writing
new_post_name: :title.md # File name of new posts
default_layout: post
titlecase: false # Transform title into titlecase
external_link: true # Open external links in new tab
filename_case: 0
render_drafts: false
post_asset_folder: true
relative_link: false
future: true
highlight:
enable: true
line_number: false
tab_replace:
hljs: true
# auto_detect: true
# wrap: true
# Markdown
## https://github.com/chjj/marked
markdown:
gfm: true
pedantic: false
sanitize: false
tables: true
breaks: true
smartLists: true
smartypants: true
# Home page setting
# path: Root path for your blogs index page. (default = '')
# per_page: Posts displayed per page. (0 = disable pagination)
# order_by: Posts order. (Order by date descending by default)
index_generator:
path: ''
per_page: 10
order_by: -date
# Category & Tag
default_category: uncategorized
category_map:
tag_map:
# Date / Time format
## Hexo uses Moment.js to parse and display date
## You can customize the date format as defined in
## http://momentjs.com/docs/#/displaying/format/
date_format: YYYY-MM-DD
time_format: HH:mm:ss
# Pagination
## Set per_page to 0 to disable pagination
per_page: 10
pagination_dir: page
# Extensions
## Plugins: https://hexo.io/plugins/
## Themes: https://hexo.io/themes/
theme: htx
# Deployment
deploy:
type: aws-s3
region: us-east-1
bucket: labelstud.io
search:
path: search.xml
field: all
content: true
# plugin to include md into md: https://github.com/tea3/hexo-include-markdown
include_markdown:
dir: source/includes

4380
docs/package-lock.json generated Normal file

File diff suppressed because it is too large Load Diff

41
docs/package.json Normal file
View File

@ -0,0 +1,41 @@
{
"name": "label-studio-documentation",
"version": "0.0.1",
"author": {
"name": "Heartex Labs",
"url": "https://github.com/heartexlabs"
},
"private": true,
"hexo": {
"version": "3.9.0"
},
"dependencies": {
"co": "^4.6.0",
"hexo": "^3.9.0",
"hexo-cli": "^3.1.0",
"hexo-deployer-git": "^2.0.0",
"hexo-generator-archive": "^0.1.5",
"hexo-generator-category": "^0.1.3",
"hexo-generator-index": "^0.2.1",
"hexo-generator-search": "^2.4.0",
"hexo-generator-tag": "^0.2.0",
"hexo-include": "^1.1.0",
"hexo-include-markdown": "^1.0.2",
"hexo-ipynb": "^0.2.4",
"hexo-jupyter-notebook": "0.0.3",
"hexo-renderer-ejs": "^0.3.1",
"hexo-renderer-marked": "^2.0.0",
"hexo-renderer-stylus": "^0.3.3",
"hexo-server": "^0.3.3",
"hexo-tag-details": "^0.1.7",
"save": "^2.4.0"
},
"scripts": {
"server": "hexo serve",
"publish": "hexo clean && hexo generate && hexo deploy",
"new": "hexo new"
},
"devDependencies": {
"hexo-deployer-aws-s3": "^1.0.2"
}
}

4
docs/scaffolds/draft.md Normal file
View File

@ -0,0 +1,4 @@
---
title: {{ title }}
tags:
---

4
docs/scaffolds/page.md Normal file
View File

@ -0,0 +1,4 @@
---
title: {{ title }}
date: {{ date }}
---

5
docs/scaffolds/post.md Normal file
View File

@ -0,0 +1,5 @@
---
title: {{ title }}
date: {{ date }}
tags:
---

1
docs/source/CNAME Normal file
View File

@ -0,0 +1 @@
labelstud.io

View File

@ -0,0 +1,242 @@
---
title: Evaluating Named Entity Recognition parsers with spaCy and Label Studio
type: blog
order: 96
image: /images/ner-blog/label_studio_and_spaCy_named_entity.png
meta_title: Evaluate NER parsers with spaCy and Label Studio
meta_description: Use Label Studio to evaluate named entity recognition parsers like spaCy and revise predictions by annotating a gold standard dataset for your data science and machine learning projects.
---
<img src="/images/ner-blog/label_studio_and_spaCy_named_entity.png" alt="Decorative graphic." class="gif-border" />
This tutorial helps you evaluate accuracy of Named Entity Recognition (NER) taggers using Label Studio. Gather predictions from standard [spaCY](https://spacy.io/) language models for a dataset based on transcripts from the podcast This American Life, then use Label Studio to correct the transcripts and determine which model performed better to focus future retraining efforts.
Named Entity Recognition (NER) parsers turn unstructured text into structured content by classifying information like organizations, dates, countries, professions, and others in the text. After a model detects those entities, they can be tagged and classified to allow for further analysis. In order to choose the best NER parser for your data analysis, you want to evaluate model performance against a relevant dataset.
You can use off-the-shelf parsers and NER taggers to handle named entity parsing and tagging, but the tagging accuracy of these for a specialized or small text corpus can often be low. Because of that, in many real-world settings you need to evaluate the accuracy of various NER taggers and fine tune the most promising one for better accuracy for your data.
## Before you start
This tutorial assumes that you're comfortable with basic natural language processing (NLP) and machine learning (ML) terminology, such as the technical meaning of evaluation and the basic function and results provided by NER taggers.
To follow along with this tutorial, you need to do the following:
- set up a local Python environment
- use pip to install packages
- run Python code to evaluate the results
## Steps in this tutorial
1. Download a podcast transcript dataset from data.world.
2. Install spaCy, pandas and the relevant spaCy models.
3. Parse the downloaded dataset with spaCy.
4. Import the dataset predictions into Label Studio.
5. Correct the predicted NER tags using Label Studio to create a gold standard data sample.
6. Evaluate the model results and compare the predictions to the gold standard.
## Download the dataset and install spaCy and pandas
Download the [This American Life dataset from data.world](https://data.world/cjewell/this-american-life-transcripts). It contains transcripts of every episode since November 1995. You need a data.world account to download the dataset.
The text corpus is available in two files in CSV format. Download the `lines_clean.csv` version of the file, ordered by line of text. This file is formatted in a way that is easier to analyze the raw text of the podcast transcripts.
An excerpt of the file looks like the following:
<img width="1137" alt="The columns of the dataset" src="https://user-images.githubusercontent.com/2641205/111653437-34b00980-8808-11eb-9514-eeabdd9556a7.png">
### Install spaCy and pandas
Before you install spaCy, make sure that `pip` is installed and has been updated recently. Type the following in the command line:
```bash
python -m pip install -U pip
```
After you install or update pip, use pip to install the most recent [spaCy](https://spacy.io) version:
```bash
pip install -U spacy
```
You also need to install [pandas](https://pandas.pydata.org), which provides methods and data structures for dataset preprocessing to make spaCy processing possible. Use the following pip command:
```bash
pip install pandas
```
### Import pre-annotated data
In order to evaluate and correct spaCy model performance for the keyword "Easter", generate and import spaCy model predictions into Label Studio.
This tutorial compares the prediction quality of the small and large English NER spaCy models, trained on written text from the web, for the podcast transcript dataset.
Run the following script to parse the dataset and output the spaCy model predictions as Label Studio tasks in JSON format:
```python
import spacy
import pandas as pd
import json
from itertools import groupby
# Download spaCy models:
models = {
'en_core_web_sm': spacy.load("en_core_web_sm"),
'en_core_web_lg': spacy.load("en_core_web_lg")
}
# This function converts spaCy docs to the list of named entity spans in Label Studio compatible JSON format:
def doc_to_spans(doc):
tokens = [(tok.text, tok.idx, tok.ent_type_) for tok in doc]
results = []
entities = set()
for entity, group in groupby(tokens, key=lambda t: t[-1]):
if not entity:
continue
group = list(group)
_, start, _ = group[0]
word, last, _ = group[-1]
text = ' '.join(item[0] for item in group)
end = last + len(word)
results.append({
'from_name': 'label',
'to_name': 'text',
'type': 'labels',
'value': {
'start': start,
'end': end,
'text': text,
'labels': [entity]
}
})
entities.add(entity)
return results, entities
# Now load the dataset and include only lines containing "Easter ":
df = pd.read_csv('lines_clean.csv')
df = df[df['line_text'].str.contains("Easter ", na=False)]
print(df.head())
texts = df['line_test']
# Prepare Label Studio tasks in import JSON format with the model predictions:
entities = set()
tasks = []
for text in texts:
predictions = []
for model_name, nlp in models.items():
doc = nlp(text)
spans, ents = doc_to_spans(doc)
entities |= ents
predictions.append({'model_version': model_name, 'result': spans})
tasks.append({
'data': {'text': text},
'predictions': predictions
})
# Save Label Studio tasks.json
print(f'Save {len(tasks)} tasks to "tasks.json"')
with open('tasks.json', mode='w') as f:
json.dump(tasks, f, indent=2)
# Save class labels as a txt file
print('Named entities are saved to "named_entities.txt"')
with open('named_entities.txt', mode='w') as f:
f.write('\n'.join(sorted(entities)))
```
After running the script, you have two files:
- A `tasks.json` file with predictions from the large and small spaCy models to import into Label Studio.
- A `named_entities.txt` file that contains the list of entities to use as labels.
## Correct the predicted Named Entities in Label Studio
To classify named entities, you need to create a dataset with gold standard labels that are accurate for your use case. To do that, use the open source data labeling tool, Label Studio.
### Install and start Label Studio
Install Label Studio in a virtual environment with `pip` using the following commands:
```bash
python3 -m venv env
source env/bin/activate
python -m pip install label-studio
```
After you install Label Studio, start the server and specify a project name:
```bash
label-studio start ner-tagging
```
Open Label Studio in your web browser at http://localhost:8080/ and create an account.
### Set up your Label Studio project
Open the `ner-tagging` project and do the following:
1. Click **Import** to add data.
2. Upload the `tasks.json` file.
<img src="/images/ner-blog/importdataNER.png" alt="Screenshot of the Label Studio data manager after importing data." class="gif-border" />
Next, set up the labeling interface with the spaCy NER labels to create a gold standard dataset.
1. From the project in Label Studio, click **Settings** and click **Labeling Interface**.
2. Select the **Named Entity Recognition** template and paste the contents of the `named_entities.txt` as the labels for the template.
3. Click **Save** to save the configuration and return to the project data.
<img src="/images/ner-blog/setupNERtemplate.gif" alt="Gif of the process of adding the named entity labels to Label Studio described in the preceding steps." class="gif-border" />
### Label your gold standard dataset in Label Studio
Click **Label** to start correcting the labeled instances of "Easter" in your data. As with all human-in-the-loop data labeling projects, the correct tag for "Easter" can be subjective and contextual. Some instances of "Easter" might be labeled with `EVT` to indicate an event, and others might be `PER` if the Easter Bunny is being discussed. Choose the labels that make the most sense for your use case.
1. For each task, review the model predictions and if needed, correct the label for the word "Easter". You can use the keyboard shortcuts to select the correct label, then highlight the word Easter in the text to label it.
2. Click **Submit** to save the new annotation and label the next task.
3. Continue until you've labeled all the tasks.
<img src="/images/ner-blog/spaCyModelPredictionsCorrected.gif" alt="Gif of the process of reviewing predictions and updating an annotation described in the preceding steps." class="gif-border" />
### Export your data to prepare to evaluate model accuracy
After you finish labeling the instances of Easter in the dataset manually, export the annotated data so that you can evaluate the model accuracy and determine which spaCy model you might want to retrain.
1. From your Label Studio project, click **Export**.
2. Select the **JSON** file format and download the data and annotations.
3. Rename the downloaded file `annotations.json`.
<img src="/images/ner-blog/exportdataNER.png" alt="Screenshot of export data modal with JSON selected as in step 2 of the preceding steps." class="gif-border" />
## Compare the spaCy model with the gold standard dataset
After you correct the predictions from the spaCy models and create a new gold standard dataset, you can compare the accuracy of each model programmatically against the gold standard you created.
Run this script to evaluate the exported annotations against the spaCy models:
```python
import json
from collections import defaultdict
tasks = json.load(open('annotations.json'))
model_hits = defaultdict(int)
for task in tasks:
annotation_result = task['annotations'][0]['result']
for r in annotation_result:
r.pop('id')
for prediction in task['predictions']:
model_hits[prediction['model_version']] += int(prediction['result'] == annotation_result)
num_task = len(tasks)
for model_name, num_hits in model_hits.items():
acc = num_hits / num_task
print(f'Accuracy for {model_name}: {acc:.2f}%')
```
The script produces something like the following output:
```bash
Accuracy for en_core_web_sm: 0.03%
Accuracy for en_core_web_lg: 0.41%
```
Both models rarely predicted the `Easter` keyword correctly, so the accuracy percentage is quite low. However, it's still clear that the larger spaCy convolutional neural network (CNN) model performed significantly better than the smaller spaCy model in this case.
With these steps, it's clear that you can evaluate the performance results of two different models with just a few minutes of annotation, without spending too much time building complex evaluation pipelines with static datasets.
## What's next?
This is a simple example using only one specific corner case based on the `Easter` keyword. You can extend this example to monitor more complex semantics, and assess more than 2 models at once. In a real-world use case, after correcting the labels for a large amount of data relevant to a project, you could then [retrain spaCy's models](https://spacy.io/usage/training) based on this new dataset.
You could also use the gold standard dataset to evaluate changes to models and determine the optimal parameters to use for a specific model to fine tune accuracy for a specific type of data. For example, evaluate and correct the predictions for one model against the gold standard dataset, then create a second model with a different set of parameters and evaluate that one against the gold standard dataset. Whichever model performs better is likely the better model for your data and use case.
Review the [example tutorials for creating a machine learning backend with Label Studio](/guide/ml_tutorials.html) to see how you can go further with automating model retraining and using Label Studio in your model development pipeline.

View File

@ -0,0 +1,294 @@
---
title: Improve Amazon Transcribe Audio Transcriptions with Label Studio
type: blog
order: 95
image: /images/aws-transcribe-blog/audio-transcription-illustration.png
meta_title: Improve Audio Transcriptions with Label Studio
meta_description: Use open source data labeling software Label Studio to improve audio transcriptions of customer support calls, video conference meetings, and other audio recordings.
---
<img src="/images/aws-transcribe-blog/audio-transcription-illustration.png" alt="Decorative graphic." class="gif-border" />
Audio transcription quality is important for accessibility, but also to ensure the quality of a service that you provide, such as customer support, or the outcomes of a meeting. You might need to transcribe any of the following types of audio:
- A recording from a journalism, user research, or radio interview
- Customer support call recordings
- Depositions for legal cases
- Business meeting recordings
- Arbitration discussions
Quality is important, but speed matters too, especially when you need to transcribe a high volume of recordings in a short time frame.
## Why audio transcription quality matters
For many cases where you're using audio transcriptions, they must be completely accurate so that patterns that you search for in the transcribed content can be easily discovered for research purposes, to help you build a stronger legal case, to more easily improve your product, to ensure high quality customer support interactions, or to reduce error in automated downstream analysis by machine learning models, such as sentiment analysis.
When high quality is crucial, having human involvement in the transcript is necessary. An expert transcriber brings field-specific knowledge and vernacular to a transcript, but it's difficult to scale high-quality human transcription. It can get expensive and time consuming. Rather than shortchange the skills of an expert, you can use automated transcription to provide a shortcut. Then, the expert can focus on correcting inaccuracies in the transcript rather than performing the entire transcription manually.
With Label Studio, you can improve audio transcription quality at scale with an easy-to-use interface.
## How to improve audio transcription quality with Label Studio
<img src="/images/aws-transcribe-blog/AWS-Audio-Transcription-Scheme.png" alt="Diagram showing the flow of information from S3 buckets to Amazon Transcribe service to Label Studio then producing ground truth transcriptions that you can use for a named entity project, sentiment analysis project, or analytics software." class="gif-border" />
In this example tutorial, you can use the [Amazon Transcribe](https://aws.amazon.com/transcribe/) service to create an automated audio transcript of an interview and combine it with human intervention in Label Studio to produce a high quality audio transcript.
## Before you start
Before you start your transcription process, make sure you have the following:
- An AWS account
- Audio files stored in Amazon S3 Buckets supported by the Amazon Transcribe service.
- The [AWS command line interface tool](https://aws.amazon.com/cli/) installed
- Python version 3.7 or higher installed
This example uses a [Sports Byline radio interview with Hank Aaron](https://www.loc.gov/item/sports000001/) from the Library of Congress digital collection.
## Configure programmatic access to Amazon Transcribe
Set up your Amazon S3 buckets with the audio files to allow the Amazon Transcribe service to read the contents programmatically.
### Create IAM user
Use the AWS CLI tool to create an identity to access the bucket on behalf of the Amazon Transcribe service. As the root user for the Amazon account, run the following command:
```bash
aws iam create-user --user-name test-transcribe-client
```
### Create AWS credentials for programmatic access
Create the credentials for the username that you just created to allow the transcription service to access the bucket. Run the following command and review the results:
```bash
aws iam create-access-key --user-name test-transcribe-client
{
"AccessKey": {
"UserName": "test-transcribe-client",
"Status": "Active",
"CreateDate": "2021-05-04T15:23:21Z",
"SecretAccessKey": "soHt5...",
"AccessKeyId": "AKIA..."
}
}
```
Set the AWS access key ID and AWS secret access key using the `aws configure` command. See the [Amazon documentation on the aws configure command](https://docs.aws.amazon.com/cli/latest/reference/configure/index.html) for more.
### Create a policy to access Amazon Transcribe
In order to allow the Amazon Transcribe service to access your bucket, you must set an IAM access policy. Refer to this policy as an example, as it provides expansive resource access.
If you're following this example with non-production data, create a file with the following contents, replacing `BUCKET-NAME` with your bucket name, and name it `TranscribePolicy.json`:
```json
{
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"transcribe:*"
],
"Resource": "*",
"Effect": "Allow"
},
{
"Effect": "Allow",
"Action": "s3:*",
"Resource": [
"arn:aws:s3:::BUCKET-NAME",
"arn:aws:s3:::BUCKET-NAME/*"
]
}
]
}
```
Otherwise, set your [existing bucket policy to allow access to the Amazon Transcribe service](https://docs.aws.amazon.com/transcribe/latest/dg/security_iam_id-based-policy-examples.html).
After creating a policy, from the command line, run the following to attach the policy to the user:
```bash
aws iam put-user-policy --user-name test-transcribe-client --policy-name TestTranscribePolicy --policy-document file://TranscribePolicy.json
```
Now you have an IAM user that you can use to securely allow access to your S3 bucket with audio files. The Amazon Transcribe service can use that user to read the audio files and write transcripts into the S3 bucket.
## Create Label Studio tasks
After setting up programmatic access for the Amazon Transcribe service, create a Python script that does the following:
- Read the audio files in your S3 bucket.
- Transcribe the speech in the audio files by calling the Amazon Transcribe service to run a transcription job on each audio file from your S3 bucket.
- Create Label Studio tasks, saving the transcript output from the service as [pre-annotations](/guide/predictions.html).
Start by specifying the location of the audio files in your S3 bucket. Use the format `s3://BUCKET-NAME/<audio_file>` to specify the bucket objects. If your recordings are stored in a different file format than `mp3`, you'll need to make further changes to this script.
Copy and paste this example code into a text editor, and update it to match your S3 bucket name and audio file:
```python
audio_files = [
's3://BUCKET-NAME/<audio_file>',
's3://BUCKET-NAME/<audio_file>'
]
```
The next part of the code creates an Amazon Transcribe client using the credentials and IAM role that you defined earlier. Copy and paste this example code into the same file. If your S3 bucket is not in the `us-east-1` region, update this section of the code.
```python
import boto3
transcribe_client = boto3.client('transcribe', region_name='us-east-1')
```
The next part of the code retrieves the Amazon Transcribe job by the job name. Copy and paste this example code into the same file:
```python
from botocore.exceptions import ClientError
def get_job(client, job_name):
"""Check if current job already exists"""
try:
response = client.get_transcription_job(
TranscriptionJobName=job_name
)
return response
except ClientError:
return
```
The script then formats the results from the Amazon Transcribe service and stores them in the Label Studio JSON format as pre-annotated audio transcripts. Copy and paste this example code into the same file:
```python
import requests
def create_task(file_uri, job):
try:
download_uri = job['TranscriptionJob']['Transcript']['TranscriptFileUri']
results = requests.get(download_uri).json()['results']
transcriptions = [r['transcript'] for r in results['transcripts']]
# confidence score for the entire phrase is a mean of confidence for individual words
confidence = sum(float(item['alternatives'][0]['confidence']) for item in results['items'] if item['type'] == 'pronunciation') / \
sum(1.0 for item in results['items'] if item['type'] == 'pronunciation')
except Exception as exc:
print(exc)
else:
return {
'data': {'audio': file_uri},
'predictions': [{
'result': [{
'from_name': 'transcription',
'to_name': 'audio',
'type': 'textarea',
'value': {'text': transcriptions}
}],
'score': confidence
}]
}
```
After preparing the format of the output, the final section of code creates an Amazon Transcribe job and saves the results in [Label Studio JSON pre-annotation format](/guide/predictions.html). Copy and paste this example code into the same file, updating the `media_format` and [`language_code`](https://docs.aws.amazon.com/transcribe/latest/dg/what-is-transcribe.html) variables if needed:
```python
import time
def transcribe_file(job_name, file_uri, transcribe_client, media_format='mp3', language_code='en-US'):
job = get_job(transcribe_client, job_name)
if job:
print(f'Transcription job {job_name} already exists.')
return create_task(file_uri, job)
print(f'Start transcription job {job_name}')
transcribe_client.start_transcription_job(
TranscriptionJobName=job_name,
Media={'MediaFileUri': file_uri},
MediaFormat=media_format,
LanguageCode=language_code
)
max_tries = 60
while max_tries > 0:
max_tries -= 1
job = transcribe_client.get_transcription_job(TranscriptionJobName=job_name)
job_status = job['TranscriptionJob']['TranscriptionJobStatus']
if job_status in ['COMPLETED', 'FAILED']:
print(f'Job {job_name} is {job_status}.')
if job_status == 'COMPLETED':
return create_task(file_uri, job)
else:
print(f'Waiting for {job_name}. Current status is {job_status}.')
time.sleep(10)
```
After the transcribed data is retrieved and reformatted to be imported into Label Studio, the code saves the results as a file called `tasks.json` in the `data/aws-transcribe-output` directory. Copy and paste this example code into the same file and update the path in the `output_file` variable to specify a location on your machine:
```python
import os
import json
tasks = []
for audio_file in audio_files:
job_name = os.path.splitext(os.path.basename(audio_file))[0]
task = transcribe_file(job_name, audio_file, transcribe_client)
if task:
tasks.append(task)
output_file = 'data/aws-transcribe-output/tasks.json'
with open(output_file, mode='w') as f:
json.dump(tasks, f, indent=2)
print(f'Congrats! Now import {output_file} into Label Studio.')
```
Save the script as a python script called `transcribe_tutorial.py` and run it from the command line:
```bash
python transcribe_tutorial.py
```
When you open the `tasks.json` file, you see something like the following example:
```json
[{
"data": {
"audio": "s3://BUCKET-NAME/sportsbyline_hankaaron.mp3"
},
"predictions": [
{
"result": [
{
"from_name": "transcription",
"to_name": "audio",
"type": "textarea",
"value": {
"text": [
"Yeah, this is America's sports talk show, Sports Byline USa Here's Ron bar. Yeah, yeah. Hank Aaron joins us here on Sports Byline USA. The Hall of Famer and also the home run king who had 755 home runs and back on april the 8th, 1974. He broke babe Ruth's all time home run record Of 714. Good evening, nice to have you back here on sports byline. It's my pleasure. Let's talk a little bit first about your career, 23 years in the majors. When you reflect back, what type of perspective do you put on your baseball career? I don't know, I think it was I was satisfied with it. You know, I don't think that I look back at my career and can find any fault with it. You know, I did just about everything I wanted to do. I accomplished all of my goals and so I I feel like it was a mission well accomplished. Did you ever think back when you were a kid growing up that you would do what you were able to do? I mean what was the dream of hank Aaron when he was a kid? No, I didn't think about it at all, you know, all I wanted to do back then of course and I was just talking to somebody about it was to try to get it in five years and that would have been, I would have been invested in the pension fund and that's mostly what ball players thought about. You know, You didn't think in terms of playing 2020, some years like I did, you know, but I'm also glad that I did. I think one thing that fans don't realize hank is that mobile Alabama turned out an awful lot of good baseball players. It seemed like it was the breeding ground for great players, wasn't it? Well, I tell you, you're absolutely right about that run. We had, we had that one time, I think that was as many as eight or nine major league players that was playing in the big leagues, you know, and they were all good. In fact Billy Williams and McCovey, they all went into the Hall of Fame. So, you know, you're talking about myself, Billy Williams to cover and satchel paige all in the Hall of Fame. So, you know, mobile doesn't have anything to be ashamed of. I think we turn loose our share of athletes. What was your first exposure to baseball hank? I don't know. I think my first exposure to baseball was just baseball itself was just playing in the negro league. I played softball in high school but never played baseball. So I started playing, actually start playing baseball when I was uh in the negro american league when I think back and I've talked with Willie Mays about the negro leagues and how special that time was..."
]
}
}
],
"score": 0.9722586794792298
}
]
}
]
```
The transcript has been shortened for readability.
## Set up Label Studio for audio transcription
1. [Install Label Studio](/guide/install.html) using your desired method.
2. Because the transcribed pre-annotations specify `s3://` URLs to the audio files in your S3 bucket, set an environment variable so that Label Studio creates presigned URLs to access the audio using the AWS credentials and the S3 access policy that you configured with the `aws configure` command. Expose the following variable on the machine with Label Studio installed:
```bash
export USE_DEFAULT_S3_STORAGE=true
```
3. Start Label Studio:
```bash
label-studio start
```
4. On the Label Studio UI, click **Create** to create an **Audio Transcription** project.
5. On the **Data Import** tab, upload the `tasks.json` file with pre-annotations from the Amazon Transcribe service.
6. On the **Labeling Setup** tab, select the **Automatic Speech Recognition** template.
7. Save your project.
<img src="/images/transcribe-blog/project_homepage.png" alt="Screenshot of the Label Studio data manager showing the pre-annotated task loaded into an Audio Transcription project." class="gif-border" />
If you want, you can sort the tasks by prediction score so that you can review the least confident results first.
8. Click **Label** to review and correct the transcript from the Amazon Transcribe service as necessary. When you first open the task, you see the predicted transcript and the audio file.
<img src="/images/transcribe-blog/pre-annotation_state.png" alt="Screenshot of the Label Studio label stream workflow, with the transcript generated by Amazon Transcribe visible as a clickable text box under a play button so that you can play the audio and correct the transcript." class="gif-border" />
9. Click the pencil icon at the bottom of the automatically-generated transcript to make corrections. Click **Submit** when you're done to save your updated transcript as an annotation.
<img src="/images/transcribe-blog/correcting_annotation.png" alt="Screenshot of the Label Studio label stream workflow, with the transcript in the process of being edited as a text box." class="gif-border" />
When you're finished reviewing and correcting transcripts, click **Export** to export the annotations in your desired format.
## What's next
By automatically transcribing a radio interview and then manually correcting the transcript using Label Studio, you can perform reliable research and trustworthy analysis on audio recordings. By combining existing automated transcription services like Amazon Transcribe with subject matter experts using Label Studio, you can quickly discover patterns and search for important information in transcribed recordings.
Beyond this example, when you improve audio transcripts with Label Studio, you can more easily trust the results of your machine learning models. You can reduce error in any automated downstream analysis that you perform, such as _sentiment analysis_ or another contextual analysis. AWS Machine Learning provides an example of [analyzing contact center calls using Amazon Transcribe and Amazon Comprehend](https://aws.amazon.com/blogs/machine-learning/analyzing-contact-center-calls-part-1-use-amazon-transcribe-and-amazon-comprehend-to-analyze-customer-sentiment/), and with Label Studio as part of that pipeline, you can get higher quality and robust call center analytics from the Amazon Comprehend service.

View File

@ -0,0 +1,229 @@
---
title: Improve OCR quality for receipt processing with Tesseract and Label Studio
type: blog
order: 95
image: /images/release-110/OCR-example.gif
meta_title: Improve OCR quality for receipt processing with Tesseract and Label Studio
meta_description: Use open source data labeling software Label Studio to improve optical character recognition (OCR) results for receipts, invoices, menus, signs, and other important images processed with Tesseract and Python.
---
Performing accurate optical character recognition (OCR) on images and PDFs is a challenging task, but one with many business applications, like transcribing receipts and invoices, or even for digitizing archival documents for record-keeping and research.
The open source [Tesseract](https://github.com/tesseract-ocr/tesseract) library is a great option for performing OCR on images, but it can be difficult to tune an automated system for your particular OCR use case.
If you've already done what you can to [treat the input](https://tesseract-ocr.github.io/tessdoc/ImproveQuality.html) and improve the likelihood of good quality output from Tesseract, focus on improving the accuracy of the results with a data labeling solution.
With Label Studio, you can import the output from Tesseract and use the Label Studio UI to correct the recognized text and produce a clean OCR dataset that you can use for model training or other data analysis.
## Steps to process receipt images with Tesseract and Label Studio
A common use case for OCR is recognizing text in receipts collected by an expense application. Follow these steps to process receipt images with Tesseract and Python and correct the results with Label Studio.
1. Get the data you want to process.
2. Write a Python script to process the images with Tesseract and output them in Label Studio format.
3. Install Label Studio and set up your project.
4. Correct the OCR results in the Label Studio UI.
5. Export the final results to train a machine learning model or to use for data analysis.
<br/><img src="/images/ocr-blog/OCR-view-predictions-and-labeling.png" alt="Screenshot of the data manager quick view to view the predicted text and bounding boxes from Tesseract OCR on a receipt image in the Label Studio UI." class="gif-border" width="800px" height="429px" />
You need some basic familiarity with Python to follow this example.
## Acquire your dataset
This example uses a receipt database made available as creative commons with attribution: CORD: A Consolidated Receipt Dataset for Post-OCR Parsing, from Park, Seunghyun and Shin, Seung and Lee, Bado and Lee, Junyeop and Surh, Jaeheung and Seo, Minjoon and Lee, Hwalsuk as part of the Document Intelligence Workshop at Neural Information Processing Systems in 2019. If you have your own dataset of receipts or other images, you can follow along with this blog post and use the same script with some changes to the image file type and location.
Process the training images in the dataset with Tesseract and prepare to correct the results with Label Studio.
1. Download the CORD-1k-002.zip file from the [link in the GitHub repository for CORD](https://github.com/clovaai/cord).
2. Expand the zip file and locate the **images** folder in the `train` directory.
3. To make the images available to Label Studio, run the following from the command line, updating the file path as needed:
```bash
./serve_local_files.sh ~/Downloads/CORD/train/image
```
This runs a web server locally to generate URLs so that you can [import data from a local directory](/guide/tasks.html#Import-data-from-a-local-directory) into Label Studio. Because the Tesseract script handles the image URLs, ignore the `files.txt` file created by the shell script.
## Write a script to process the images
Now that you have a dataset to work with, write a Python script to process the images in the receipt dataset with Tesseract OCR and return the recognized text, confidence scores for each image and each region, and the bounding boxes for each section of recognized text. This script saves the Tesseract output to a file in the [Label Studio JSON format for predicted annotations](/guide/predictions.html).
### Install and import necessary libraries
You need to [install Tesseract](https://tesseract-ocr.github.io/tessdoc/Installation.html) and the [`pytesseract`](https://pypi.org/project/pytesseract/) library.
Create a Python file named `tesseractocr.py` and place the following imports at the top:
```python
import os
import json
import pytesseract
from PIL import Image
from pathlib import Path
from uuid import uuid4
```
### Decide how to process the images
The `pytesseract` library lets you specify the level of detail that you want to use for the bounding boxes. You can create page-level bounding boxes, where each page has one bounding box with all the recognized text, or have a bounding box for each block of text, paragraph of text, line of text, or one bounding box for each word.
Add the following to your `tesseractocr.py` script:
```python
# tesseract output levels for the level of detail for the bounding boxes
LEVELS = {
'page_num': 1,
'block_num': 2,
'par_num': 3,
'line_num': 4,
'word_num': 5
}
```
### Reference the images in the script
Label Studio handles images as URLs, so define a function that maps the images in the receipt dataset to URLs that Label Studio can open. If you use the script to run a web server locally as recommended, the image URLs are formatted like `http://localhost:8081/filename.png`.
Add the following to your `tesseractocr.py` script:
```python
def create_image_url(filepath):
"""
Label Studio requires image URLs, so this defines the mapping from filesystem to URLs
if you use ./serve_local_files.sh <my-images-dir>, the image URLs are localhost:8081/filename.png
Otherwise you can build links like /data/upload/filename.png to refer to the files
"""
filename = os.path.basename(filepath)
return f'http://localhost:8081/{filename}'
```
If you need to use a different format, for example if you chose to upload the files directly to Label Studio or import them using the storage sync options, update this function to return references like `/data/upload/filename.png` to refer to the files.
### Define how to convert the results to Label Studio JSON format
After you decide how to process the images and prepare them for Label Studio, construct the next portion of the script to define how to retrieve the results from Tesseract and transform the output into the Label Studio [JSON format for predicted annotations](/guide/predictions.html).
Add the following to your `tesseractocr.py` script:
```python
def convert_to_ls(image, tesseract_output, per_level='block_num'):
"""
:param image: PIL image object
:param tesseract_output: the output from tesseract
:param per_level: control the granularity of bboxes from tesseract
:return: tasks.json ready to be imported into Label Studio with "Optical Character Recognition" template
"""
image_width, image_height = image.size
per_level_idx = LEVELS[per_level]
results = []
all_scores = []
for i, level_idx in enumerate(tesseract_output['level']):
if level_idx == per_level_idx:
bbox = {
'x': 100 * tesseract_output['left'][i] / image_width,
'y': 100 * tesseract_output['top'][i] / image_height,
'width': 100 * tesseract_output['width'][i] / image_width,
'height': 100 * tesseract_output['height'][i] / image_height,
'rotation': 0
}
words, confidences = [], []
for j, curr_id in enumerate(tesseract_output[per_level]):
if curr_id != tesseract_output[per_level][i]:
continue
word = tesseract_output['text'][j]
confidence = tesseract_output['conf'][j]
words.append(word)
if confidence != '-1':
confidences.append(float(confidence / 100.))
text = ' '.join(words).strip()
if not text:
continue
region_id = str(uuid4())[:10]
score = sum(confidences) / len(confidences) if confidences else 0
bbox_result = {
'id': region_id, 'from_name': 'bbox', 'to_name': 'image', 'type': 'rectangle',
'value': bbox}
transcription_result = {
'id': region_id, 'from_name': 'transcription', 'to_name': 'image', 'type': 'textarea',
'value': dict(text=[text], **bbox), 'score': score}
results.extend([bbox_result, transcription_result])
all_scores.append(score)
return {
'data': {
'ocr': create_image_url(image.filename)
},
'predictions': [{
'result': results,
'score': sum(all_scores) / len(all_scores) if all_scores else 0
}]
}
```
This section of the script defines the function to retrieve bounding boxes for each block of text and convert the pixel sizes for the bounding boxes into the [image annotation units expected by Label Studio](/guide/predictions.html#Units_for_image_annotations).
Tesseract also produces a confidence score for each word that it processes. This script averages that confidence score for each block of recognized text so that you can review lower-confidence predictions before other regions, and averages all the scores for all regions for each image to provide an overall prediction score for each task. Reviewing lower-confidence predictions first lets you focus on the text regions that are least likely to be correct and use your annotation time wisely.
### Process the images and export the results to a file
In the last part of the script, call the functions to process the images with Tesseract and convert the output to Label Studio JSON format for predictions. Finally, export the results to a file that you can add to Label Studio for reviewing and correcting the recognized text regions.
Add the following to your `tesseractocr.py` script:
```python
tasks = []
# collect the receipt images from the image directory
for f in Path('image').glob('*.png'):
with Image.open(f.absolute()) as image:
tesseract_output = pytesseract.image_to_data(image, output_type=pytesseract.Output.DICT)
task = convert_to_ls(image, tesseract_output, per_level='block_num')
tasks.append(task)
# create a file to import into Label Studio
with open('ocr_tasks.json', mode='w') as f:
json.dump(tasks, f, indent=2)
```
## Run the script to collect results
Save your script and run it from the command line to process the images.
The script expects the image directory to be a subdirectory of where you run the script, so if you saved the dataset to your `Downloads` folder, change your directory to the `train` directory and save and run the script there, or move the `image` directory to the same directory as the script.
From the command line, run the following:
```bash
python3 tesseractocr.py
```
The script takes a minute or so to run and process the images. When it finishes, a file called `ocr_tasks.json` is saved in the directory where you run the script.
## Correct the results in Label Studio
After you've processed the images with Tesseract and Python, you can start working with the results in Label Studio to make any adjustments and validate the accuracy of the OCR.
### Install and set up Label Studio
1. [Install Label Studio](/guide/install.html) using your preferred method. If you install Label Studio using Docker, [set environment variables to access the image files](/guide/start.html#Run-Label-Studio-on-Docker-and-use-local-storage).
2. [Create a project](/guide/setup_project.html) called `OCR Receipts` to manage the results.
3. Import the `ocr_tasks.json` file from the `tesseractocr.py` script run.
4. Select the **Optical Character Recognition** template for your labeling interface. If you want, change the region labels to describe product names and prices, or leave the template with the default region labels of Text and Handwriting.
5. Adjust the data manager fields to show the **Prediction score** on the UI and sort the prediction scores to view the lowest score images first.
<br/><img src="/images/ocr-blog/OCR-low-prediction-score-sort.png" alt="Screenshot of the data manager in Label Studio showing the OCR images sorted by prediction score." class="gif-border" width="800px" height="429px"/>
### Correct the recognized text
After you set up your project and import the results from Tesseract, start labeling the tasks to review the OCR predictions and make any necessary changes.
1. Click **Label all Tasks** and start correcting the recognized text regions.
2. Locate a region in the sidebar and click to update the text.
<img src="/images/ocr-blog/OCR-correct-single-region.gif" alt="Gif of updating the recognized text for a region to match teh text on the image in the Label Studio UI." class="gif-border" width="800px" height="500px"/>
3. Click a label to identify the region as Text, a Product Name, or a Price.
3. If needed, add additional regions.
<img src="/images/ocr-blog/OCR-add-and-label-new-regions.gif" alt="Gif of adding new regions to the image and transcribing and labeling the text in the Label Studio UI." class="gif-border" width="800px" height="511px" />
3. When you're done, click **Update** or **Submit** to move on to the next task.
<img src="/images/ocr-blog/OCR-update-and-label-regions.gif" alt="Gif of updating recognized text and adjusting a bounding box in the Label Studio UI." class="gif-border" width="800px" height="500px" />
Repeat the labeling steps for every image in the dataset.
## Takeaways for OCR with Tesseract and Label Studio
OCR is a valuable tool when automating tasks that are time-consuming and error-prone for humans. Take advantage of existing OCR tools to save time and money when building your own machine learning model. You can use a tool like Label Studio to improve the overall results of the OCR tools so that you can have confidence in the resulting dataset.
This example tutorial showcases the capabilities of Label Studio with OCR use cases, especially using Tesseract and Python to process the images and data. You can reuse this script to process other types of images with OCR, such as parsing screenshots, recording invoices, identifying addresses, reading street signs and house numbers, transcribing archival documents, and more.
You can even go to the next level and further process the text recognized by Tesseract and corrected in Label Studio by performing named entity recognition to parse the meaning or sentiment of the recognized text.

View File

@ -0,0 +1,72 @@
---
title: 10 important considerations for NLP labeling
type: blog
order: 95
image: /images/nlp-blog/nlp-labeling-relations-de.png
meta_title: 10 important considerations for NLP labeling
meta_description: The top 10 important considerations for NLP labeling and functionality in labeling tools for natural language processing machine learning projects.
---
When you perform natural language processing (NLP) labeling to create machine learning models, you want the data labeling tool you choose to have specific functionality so that you can create a high-quality dataset to train or fine-tune your model.
Weve identified the ten most important features that you need to consider when starting with an NLP labeling tool.
## 1. Label text spans
You must be able to label text spans with the tool that you choose for NLP labeling. Whether you're trying to label entire words, sentences, or some other span of text, this is the most basic feature you need.
<br/><img src="/images/nlp-blog/nlp-no-relations.png" alt="Screenshot of NLP labeling being performed by highlighting named entities and assigning labels to words." class="gif-border" />
## 2. Label intersecting and overlapping text
Depending on the type of NLP labeling that you're performing, you likely need to be able to label intersecting and overlapping text, for example, to label both full words and morphemes.
<br/><img src="/images/nlp-blog/nlp-overlapping-regions.png" alt="Screenshot of overlapping labeled named entities, with text Mein Mann hat einen Auto gekauft, das kaputt ist, where Mein Mann and Mann are both labeled Person and gekauft and kauft are both labeled action. " class="gif-border" />
## 3. Label partial words
For some NLP labeling use cases, you might need to label partial words. For example, training a model to recognize prefixes, suffixes, or compound words. If you're limited to text spans that are at least a full word, you won't be able to perform your labeling tasks.
<br/><img src="/images/nlp-blog/nlp-labeled-regions.png" alt="Screenshot of NLP labeled regions showing the labels applied to different partial word forms, such as gekauft and kauft." class="gif-border" />
## 4. Identify and label relations between text spans
For most NLP labeling tasks, you need to be able to identify relations between text spans so that you can train your natural language understanding (NLU) model on those relations.
More importantly, after you identify those relations as part of your NLP labeling, you need to be able to label those relations. Define what makes those relations relevant to your NLU model by labeling subject-verb agreement interactions, or labeling which noun a relative pronoun refers to.
<br/><img src="/images/nlp-blog/nlp-defining-labeling-relations.png" alt="Screenshot of one way to apply a relation between named entities in NLP labeling, along with the label that you can apply to the relationship." class="gif-border" />
## 5. Define consistent labels
Whether you're defining an ontology or terminology, you want the labels that annotators apply to the text to be consistent across all NLP labeling activities. Make sure that the schema that you define for labeling requires a consistent list of labels to use across all labeling tasks.
<br/><img src="/images/nlp-blog/nlp-labeling-taxonomy.png" alt="Screenshot of the defined labeling taxonomy that you can define for a specific set of text." class="gif-border" />
## 6. Support for many classes
You want your labels to be consistent, but you also want the flexibility to choose from a wide array of classification options. If you can add dozens or even hundreds of labels for annotators to choose from, and if those annotators can filter and search for the correct labels when annotating the data, you can complete your large scale NLP labeling project faster.
## 7. Semantic classification
Beyond labeling portions of the text as a specific part of speech or morpheme, you might also want to classify the semantic meaning of the text. A tool that allows you to classify meaning, as well as highlight and label text spans, is a flexible tool that can support a number of NLP labeling tasks.
## 8. Save draft annotations
When labeling long text samples, you want the ability to take a break from labeling and return where you left off. If you can save draft annotations, it is easier to complete more time-consuming NLP labeling tasks.
<br/><img src="/images/nlp-blog/nlp-draft-annotations.png" alt="Screenshot of NLP labeling user interface with UI text reading not submitted draft to make it clear that the annotation is still in draft form." class="gif-border" />
## 9. Support multiple annotators
In order to ensure high-quality annotations, you want to make sure that whichever tool you choose for your NLP labeling, it supports multiple annotators. Often, the more people that label a task, the higher quality the annotations are likely to be.
## 10. Non-English labeling capabilities
Not all natural language processing happens in English. A tool that can support non-English languages and special characters is important to capture the global and expressive nature of text-based communication in the modern world.
<br/><img src="/images/nlp-blog/nlp-labeling-relations-de.png" alt="Screenshot of NLP labeling user interface with UI text reading not submitted draft to make it clear that the annotation is still in draft form." class="gif-border" />
## Conclusion
There are a lot of data labeling tools available. It's important to carefully assess the best one to help you perform NLP labeling. We hope that this list of important functionality can help you define the requirements for your next NLP labeling tool!

244
docs/source/blog/index.html Normal file
View File

@ -0,0 +1,244 @@
---
type: blog
order: 101
meta_title: Best Practices for Data Labeling & Annotation
meta_description: Data labeling and annotation articles and best practices for machine learning and data science projects from the experts behind Label Studio.
---
<style>
.sidebar {
display: none;
}
</style>
<link rel="stylesheet" href="/blog/styles.css">
<div class="blog-body">
<div class="grid">
<!-- Transfer learning with PyTorch -->
<!-- <div class="column"> -->
<!-- <a href="/notebook/transfer-learning-tutorial.html"> -->
<!-- <div class="card"> -->
<!-- <div class="image-wrap"> -->
<!-- <div style="background-image: url(/images/release-050-views.png)" class="image"></div> -->
<!-- </div> -->
<!-- <div class="category">tutorial</div> -->
<!-- <div class="desc">31 Mar 2020, 5 min read</div> -->
<!-- <div class="title">Transfer learning with PyTorch</div> -->
<!-- </div> -->
<!-- </a> -->
<!-- </div> -->
<!-- OCR blog -->
<div class="column">
<a href="/blog/Improve-OCR-quality-with-Tesseract-and-Label-Studio.html">
<div class="card">
<div class="image-wrap">
<div style="background-image: url(/images/release-110/OCR-example.gif); background-size:cover; background-position: 0px -65px;" class="image"></div>
</div>
<div class="category">article</div>
<div class="desc">13 July 2021, 8 min read</div>
<div class="title">Improve OCR quality for receipt processing with Tesseract and Label Studio</div>
</div>
</a>
</div>
<!-- Release 1.1.0 -->
<div class="column">
<a href="/blog/release-110.html">
<div class="card">
<div class="image-wrap">
<div style="background-image: url(/images/release-110/multi-labeling.gif); background-size:cover" class="image"></div>
</div>
<div class="category">release notes</div>
<div class="desc">30 June 2021, 3 min read</div>
<div class="title">Label Studio v1.1 is now available!</div>
</div>
</a>
</div>
<!-- Publish NLP labeling blog -->
<div class="column">
<a href="/blog/Ten-important-considerations-for-NLP-labeling.html">
<div class="card">
<div class="image-wrap">
<div style="background-image: url(/images/nlp-blog/nlp-labeling-relations-de.png); background-size:cover; background-position: center" class="image"></div>
</div>
<div class="category">article</div>
<div class="desc">17 June 2021, 4 min read</div>
<div class="title">10 important considerations for NLP labeling</div>
</div>
</a>
</div>
<!-- Publish Amazon Transcribe blog -->
<div class="column">
<a href="/blog/Improve-Audio-Transcriptions-with-Label-Studio.html">
<div class="card">
<div class="image-wrap">
<div style="background-image: url(/images/aws-transcribe-blog/audio-transcription-illustration.png); background-size:cover; background-position: left" class="image"></div>
</div>
<div class="category">article</div>
<div class="desc">10 May 2021, 11 min read</div>
<div class="title">Improve Amazon Transcribe Audio Transcriptions with Label Studio</div>
</div>
</a>
</div>
<!-- Publish NER blog -->
<div class="column">
<a href="/blog/Evaluating-Named-Entity-Recognition-parsers-with-spaCy-and-Label-Studio.html">
<div class="card">
<div class="image-wrap">
<div style="background-image: url(/images/ner-blog/label_studio_and_spaCy_named_entity.png); background-size:cover" class="image"></div>
</div>
<div class="category">article</div>
<div class="desc">3 May 2021, 12 min read</div>
<div class="title">Evaluating Named Entity Recognition parsers with spaCy and Label Studio</div>
</div>
</a>
</div>
<!-- Release 1.0.0 -->
<div class="column">
<a href="/blog/release-100.html">
<div class="card">
<div class="image-wrap">
<div style="background-image: url(/images/release-100/announcement.png); background-size:cover" class="image"></div>
</div>
<div class="category">release notes</div>
<div class="desc">15 March 2021, 4 min read</div>
<div class="title">Label Studio Hits v1.0!</div>
</div>
</a>
</div>
<!-- News letters -->
<div class="column">
<div class="card">
<div class="image-wrap highlight">
<center>
<div class="title">Label Studio Newsletter</div>
<p>We send a periodic newsletter announcing new features as well as some ML-related papers, label techniques that are innovative and funny stories.</p>
<iframe src="https://labelstudio.substack.com/embed" frameborder="0" scrolling="no" style="width:90%"></iframe>
</center>
</div>
</div>
</div>
<!-- Release 0.9.0 -->
<div class="column">
<a href="/blog/release-090-data-management-improvements.html">
<div class="card">
<div class="image-wrap">
<div style="background-image: url(/images/release-090/aerial_training_data_management.png); background-size:cover" class="image"></div>
</div>
<div class="category">release notes</div>
<div class="desc">21 Jan 2021, 8 min read</div>
<div class="title">Label Studio 0.9.0 Release - Explore and improve your datasets</div>
</div>
</a>
</div>
<!-- Release 0.8.0 -->
<div class="column">
<a href="/blog/release-080-time-series-labeling.html">
<div class="card">
<div class="image-wrap">
<div style="background-image: url(/images/release-080/time-series-labeling-with-multiple-channels.png); background-size:cover" class="image"></div>
</div>
<div class="category">release notes</div>
<div class="desc">26 Oct 2020, 7 min read</div>
<div class="title">Label Studio 0.8.0 Release - Time Series is Everywhere!</div>
</div>
</a>
</div>
<!-- Release 0.7.0 -->
<div class="column">
<a href="/blog/release-070-cloud-storage-enablement.html">
<div class="card">
<div class="image-wrap">
<div style="background-image: url(/images/release-070/s3-mascot-04.png); background-size:cover" class="image"></div>
</div>
<div class="category">release notes</div>
<div class="desc">29 May 2020, 5 min read</div>
<div class="title">Label Studio 0.7.0 Release - Cloud Storage Enablement</div>
</div>
</a>
</div>
<!-- Release 0.6.0 -->
<div class="column">
<a href="/blog/release-060-nested-data-labeling.html">
<div class="card">
<div class="image-wrap">
<div style="background-image: url(/images/release-060/nested_labeling.gif); background-size:cover" class="image"></div>
</div>
<div class="category">release notes</div>
<div class="desc">8 May 2020, 7 min read</div>
<div class="title">Label Studio 0.6.0 Release - Nested Data Labeling</div>
</div>
</a>
</div>
<!-- Release 0.5.0 -->
<div class="column">
<a href="/blog/release-050.html">
<div class="card">
<div class="image-wrap">
<div style="background-image: url(/images/release-050-views.png)" class="image"></div>
</div>
<div class="category">release notes</div>
<div class="desc">9 Mar 2020, 5 min read</div>
<div class="title">Label Studio 0.5.0 Release - Relations</div>
</div>
</a>
</div>
<!-- Bert -->
<div class="column">
<a href="https://towardsdatascience.com/how-to-finetune-bert-to-classify-your-slack-chats-without-coding-3a7002936bcf">
<div class="card">
<div class="image-wrap">
<div class="image" style="background-image: url(/images/ls-bert-slack.jpg)"></div>
</div>
<div class="category">tutorial</div>
<div class="desc">18 Feb 2020, 5 min read</div>
<div class="title">How to fine-tune BERT to classify your Slack chats without coding</div>
</div>
</a>
</div>
<!-- Intro LS -->
<div class="column">
<a href="https://towardsdatascience.com/introducing-label-studio-a-swiss-army-knife-of-data-labeling-140c1be92881">
<div class="card">
<div class="image-wrap">
<div class="image" style="background-image: url(/images/annotation_examples.gif)"></div>
</div>
<div class="category">article</div>
<div class="desc">28 Jan 2020, 8 min read</div>
<div class="title">Introducing Label Studio, a swiss army knife of data labeling</div>
</div>
</a>
</div>
<!-- AL -->
<div class="column">
<a href="https://towardsdatascience.com/learn-faster-with-smarter-data-labeling-15d0272614c4">
<div class="card">
<div class="image-wrap">
<div class="image" style="background-image: url(/images/AL-learn-fast.png)"></div>
</div>
<div class="category">article</div>
<div class="desc">21 Aug 2019, 8 min read</div>
<div class="title">Learn faster with smarter data labeling</div>
</div>
</a>
</div>
</div>
</div>

View File

@ -0,0 +1,146 @@
---
title: Label Studio Release Notes 0.5.0
type: blog
order: 102
meta_title: Label Studio Release Notes 0.5.0
meta_description: Label Studio Release 0.5.0 includes image segmentation, relations labeling, named entity recognition performance, image ellipses labeling, and more.
---
A month in the making, this new release brings a lot of bugfixes, updated documentation, and of course, a set of new features that have been requested.
## Label Studio Frontend
### Relations labeling
You can create relations between labeled regions. For example, if you put two bounding boxes, you can connect them with a relation. We've extended the functionality to include the direction of the relation, and the possibly label the relation. Here is an example config for that:
```html
<View>
<Relations>
<Relation value="Is A" />
<Relation value="Has Function" />
<Relation value="Involved In" />
<Relation value="Related To" />
</Relations>
<Labels name="lbl-1" toName="txt-1">
<Label value="Subject"></Label>
<Label value="Object"></Label>
</Labels>
<Text name="txt-1" value="$text"></Text>
</View>
```
### Named Entity Recognition performance
NER got an update, nested entities representation is more apparent now, and it's optimized to support large texts.
<br>
<img src="/images/release-050-ner.png">
### Image Segmentation
Initial implementation of the image segmentation using masks. You get two controls, brush with configurable size, and eraser. The output format is RLE implemented by [rle-pack](https://www.npmjs.com/package/@thi.ng/rle-pack) library.
There is a [new template](/templates/image_segmentation.html) available that provides more information about the setup.
### Changing the labels
Changing the labels of the existing regions is now easy and supported for any of the data types.
### Validate labeling before submitting
Simple validation to protect you from empty results. When choices or labels are required you can specify `required=true` parameter for the <Labels/> or <Choices/> tag.
### Labels and Choices now support more markup
That enables you to build more complex interfaces. Here is an example that puts labels into different groups:
<br>
<img src="/images/release-050-views.png" style="max-width: 732px">
```html
<View>
<Choices name="label" toName="audio" required="true" choice="multiple" >
<View style="display: flex; flex-direction: row; padding-left: 2em; padding-right: 2em; margin-bottom: 3em">
<View style="padding: 1em 4em; background: rgba(255,0,0,0.1)">
<Header size="4" value="Speaker Gender" />
<Choice value="Business" />
<Choice value="Politics" />
</View>
<View style="padding: 1em 4em; background: rgba(255,255,0,0.1)">
<Header size="4" value="Speech Type" />
<Choice value="Legible" />
<Choice value="Slurred" />
</View>
<View style="padding: 1em 4em; background: rgba(0,0,255,0.1)">
<Header size="4" value="Additional" />
<Choice value="Echo" />
<Choice value="Noises" />
<Choice value="Music" />
</View>
</View>
</Choices>
<Audio name="audio" value="$url" />
</View>
```
### Image Ellipses labeling
A significant contribution from [@lrlunin](https://github.com/lrlunin), implementing ellipses labeling for the images, checkout the [template](/templates/image_ellipse.html).
<img src="/images/screens/image_ellipse.png" class="img-template-example" title="Images Ellipse" />
### Misc
- **zoomControl, brightnessControl and contrastControl for the image tag** - zoom has been available for sometime, but now there is an additional toolbar that can be created if one of the above params is provided to the <Image/> tag.
- **select each region with shift+alt+number** - hotkeys to quickly navigate the regions
- **settings now show the hotkeys** - show the defined and available hotkeys inside the Hotkeys tab in the Settings
- **simplifying the creation of concave polygons** - polygons are not closed unless fully defined, that enables you to create concave polygons easily
- **HyperText works with its body** now you can put in HTML right into the HyperText tag, here is an example config:
```html
<View>
<HyperText><h1>Hello</h1></HyperText>
</View>
```
## Label Studio Backend
### Multiplatform
Support for Windows, MacOSX, Linux with Python 3.5 or greater
### Extended import possibilities
There are now several ways on how you can import your tasks for labeling:
- uploading files via [web UI](http://localhost:8080/import)
- by [specifying path](/guide/tasks.html#Import-formats) to a file or directory with images, audios or text files on Label Studio initialization
- using [import API](/guide/tasks.html#Import-using-API)
### On-the-fly labeling config validation
Previously changing a config after importing or labeling tasks could be dangerous because of created tasks/completions invalidation, therefore this was switched off. Now you should not worry about that - labeling config validation is taken on the fly considering the data already created. You can freely change the appearance of your project on [setup page](http://localhost:8080/setup) and even add new labels - when you modify something crucial, you'll be alerted about.
### Exporting with automatic converters
When finishing your project - go to the [export page](http://localhost:8080/export) and choose in between the [common export formats](/guide/export.html#Export-formats) valid for your current project configuration.
### Connection to running Machine Learning backend
[Connecting to a running machine learning backend](/guide/ml.html) allows you to retrain your model continually and visually inspect how its predictions behave on tasks. Just specify ML backend URL when launching Label Studio, and start labeling.
## Miscellaneous
### Docker support
Now Label Studio is also maintained and distributed as Docker container - [run one-liner](/guide/index.html#Running-with-Docker) to build your own cloud labeling solution.
### Multisession mode
You can launch Label Studio in [multisession mode](/guide/#Multisession-mode) - then each browser session dynamically creates its own project.

View File

@ -0,0 +1,84 @@
---
title: Label Studio Release Notes 0.6.0 - Nested Data Labeling
type: blog
order: 101
meta_title: Label Studio Release Notes 0.6.0 - Nested Data Labeling
meta_description: Label Studio Release 0.6.0 includes nested data labeling, per-region labeling, updates to machine learning backend integration, filtering, and more.
---
Two months in the baking, this release features the evolution of the labeling interface into supporting not only multiple data types and labeling scenarios, but also exploring the path of bringing additional dimensions into the labeling task. Along with that purely UI work a major update to the model assisted labeling.
We've had a panel for the predictions for a while, the idea behind it is to provide a set of predictions possibly coming from different models to explore and adjust, and now there is a new models page to easier manage what is connected and used for generating those predictions.
Here is more on the results of this update:
## Nested Labeling
Nested labeling enables you to specify multiple classification options that show up after youve selected a connected parent class:
<br/>
<img src="/images/release-060/nested_labeling.gif" class="gif-border" />
<br/>
It can match based on the selected Choice or Label value, and works with a `required` attribute too, smart selecting the region that youve not labeled. To try it out check [Choices](/tags/choices.html) documentation and look for the following attributes: `visibleWhen`, `whenTagName`, `whenLabelValue`, `whenChoiceValue`.
## Per region labeling
With per region labeling you can now provide additional attributes to the labeled regions. For example, when doing audio segmentation you can further classify the region. Per region is available for any data type and the following control tags: [Choices](/tags/choices.html), [TextArea](/tags/textarea.html), and [Rating](/tags/rating.html).
<br/>
<img src="/images/release-060/per-region.gif" class="gif-border" />
It nicely integrates with the nested labeling, for example, you can provide multiple levels of classification for any particular region.
## Machine Learning Updates
New ML page in UI, where you can specify URLs to connect ML backends, manually trigger model training, explore training statuses, and quickly check predictions by drag-n-dropping tasks.
<br/>
<img src="/images/release-060/model_page.png" class="gif-border" />
### Multiple Backends
Label Studio now supports multiple ML backends connected together. You can simultaneously get multiple predictions for each task and do comparative performance analysis for different models or different hyperparameters of a single model. It's possible to connect as many backends as you want by using `--ml-backend url1 url2 ...` command-line option or adding them via UI.
### Connecting models
Creating & connecting machine learning backend becomes way easier - simply define your model.py script with `.fit()` / `.predict()` methods and run ML backend with `label-studio-ml start --init --script=model.py`. Check quickstart and tutorials on how to connect sklearn and PyTorch models
## Filtering
When the number of labels or choices is big, looking for a particular one becomes tedious. New <Filter /> tag to the rescue. It works with any list of Labels / Choices, and is keyboard-driven. Here is an example of the interaction:
<br/>
<img src="/images/release-060/filtering.gif" class="gif-border" />
Hitting `shift+f` puts focus, then hitting Enter key selects the first matching item.
## Display Label Names
Displaying labels on top of the labeled regions proved to be a useful feature if youd like to do a verification of the labeling. Visually inspecting the regions takes smaller amounts of time than doing so through switching between regions.
<br/>
<img src="/images/release-060/show-labels.gif" class="gif-border" />
### Models Scores
Along with the names of the labels you can provide a prediction score for specific regions. That score may either come from the data that you upload or from the model that youve connected. When its available you can **Sort by the score**, and quickly verify/adjust the labeling for the most “uncertain” regions.
## Keeping the label active
If you label the same type of data it may be cumbersome to keep selecting the same label over and over again, now you can choose to keep the last label active and use it for new labeling.
<br/>
<img src="/images/release-060/keep-label-active.gif" class="gif-border" />
Dont forget to unselect the region when you want to select a new label, otherwise, youd change the label of the existing region.
## Bug fixes & improvements
* --host argument now available via command-line argument (thanks to [@hachreak](https://github.com/hachreak))
* fixed upload with plain text tasks (thanks to [@gauthamsuresh09](https://github.com/gauthamsuresh09))
* fixed one-click deploy on Google Cloud (thanks to [@iCorv](https://github.com/iCorv))
* fixed URL paths for proxy safety (thanks to [ezavesky](https://github.com/ezavesky))

View File

@ -0,0 +1,58 @@
---
title: Label Studio Release Notes 0.7.0 - Cloud Storage Enablement
type: blog
order: 100
meta_title: Label Studio Release Notes 0.7.0 - Cloud Storage
meta_description: Label Studio Release 0.7.0 includes new connectors to integrate Label Studio with cloud storage, including Amazon AWS S3 and Google Cloud Storage.
---
Just a couple of weeks after our 0.6.0 release, were happy to announce a new big release. Weve started the discussion about the cloud months ago, and as the first step in simplifying the integration, were happy to introduce cloud storage connectors, like AWS S3.
Were also very interested to learn more from you about your ML pipelines, if youre interested in having a conversation, please ping us on [Slack](http://slack.labelstud.io.s3-website-us-east-1.amazonaws.com?source=blog-release).
<br/>
<img src="/images/release-070/s3-mascot-04.png" />
## Connecting cloud storage
You can configure label studio to synchronize labeling tasks with your s3 or gcp bucket, potentially filtering by a specific prefix or a file extension. Label Studio will take that list and generate pre-signed URLs each time the task is shown to the annotator.
<br/>
<img src="/images/release-070/configure-s3.gif" class="gif-border" />
There are several ways how label studio can load the file, either as a URL or as a blob therefore, you can store the list of tasks or the assets themselves and load that.
<br/>
<img src="/images/release-070/s3-config.png" class="gif-border" />
You can configure it to store the results back to s3/gcp, making Label Studio a part of your data processing pipeline. Read more about the configuration in the docs [here](/guide/storage.html).
## Frontend package updates
Finally with a lot of [work](https://github.com/heartexlabs/label-studio-frontend/pull/75) from [Andrew](https://github.com/hlomzik) there is an implementation of frontend testing. This will make sure that we dont break things when we introduce new features. Along with that another Important part — improved building and publishing process, configured CI. Now the npm frontend package will be published along with the pip package.
## Labeling Paragraphs and Dialogs
Introducing a new object tag called “Paragraphs”. A paragraph is a piece of text with potentially additional metadata like the author and the timestamp. With this tag were also experimenting now with an idea of providing predefined layouts. For example to label the dialogue you can use the following config: `<Paragraphs name=“conversation” value=“$conv” layout=“dialogue” />`
<br/>
<img src="/images/release-070/dialogues.png" class="gif-border" />
This feature is available in the [enterprise version](https://heartex.ai/) only
## Different shapes on the same image
One limitation label studio had was the ability to use only one shape on the same image, for example, you were able to put either bounding boxes or polygons. Now this limitation is waived and you can define different label groups and connect those to the same image.
<br/>
<img src="/images/release-070/multiple-tools.gif" class="gif-border" />
## maxUsages
There are a couple of ways how you can make sure that the annotation is being performed in full. One of these concepts is a `required` flag, and weve created a new one called `maxUsages`. For some datasets you know how much objects of a particular type there is, therefore you can limit the usage of specific labels.
## Bugfixes and Enhancements
- Allow different types of shapes to be used in the same image. For example you can label the same image using both rectangles and ellipses.
- Fixing double text deserialization https://github.com/heartexlabs/label-studio-frontend/pull/85
- Fix bug with groups of required choices https://github.com/heartexlabs/label-studio-frontend/pull/74
- Several fixes for NER labeling — empty captured text, double clicks, labels appearance

View File

@ -0,0 +1,140 @@
---
title: Label Studio Release Notes 0.8.0 - Time Series Data Labeling
type: blog
order: 99
meta_title: Label Studio Release Notes 0.8.0 - Time Series Data Labeling
meta_description: Label Studio Release 0.8.0 includes updates to support data labeling for time series data.
---
# Time Series Data Labeling
Time series is everywhere! Devices, sensors and events produce time series, for example, your heartbeat can be represented as a series of events measured every second, or your favorite step tracker recording a number of steps you take per minute.
All these signals can be used for ML model development, and we're excited to present you with one of the first time series data labeling solutions that work across a variety of use-cases and can help you develop ML applications based on time series data!
<br/>
<img src="/images/release-080/main.gif" class="gif-border" />
> Labeled time series data is crucial if you want to develop supervised ML models for pattern recognition. It can also serve as a ground truth data for validating methods performance. Read below for some of the scenarios and implementation details
## Labeling UI Performance
A majority of time series datasets tend to have a lot of points. Therefore the tool has to scale well to handle the situation when you have more than 100K points. Initially we've tried to use some existing frontend libraries that provide time series implementation, but it turned out that none of them were up for the task, even with just 10,000 points you'd start to experience the lag when zooming or panning. It was clear that we need to come up with a more robust implementation. We've based the rendering on d3 and after numerous optimization attempts we've got to the desired result:
### **1,000,000 data points and 10 channels**
<img src="/images/release-080/ui.gif" class="gif-border" />
Some of the techniques we have used include tiling - when we have a big number of datapoint we split it into chunks and render those chunks first, this helps us achieve great performance when the number of data points is very large. When you zoom out the algorithm samples specific points to give you an overview of your time series data.
## Working with a variety of input types out of the box
For examples below we will be using the following configuration:
```html
<View>
<TimeSeriesLabels name="label" toName="ts">
<Label value="Walk" />
<Label value="Run" />
</TimeSeriesLabels>
<TimeSeries name="ts" valueType="url" value="$csv" sep="," overviewChannels="sen1,sen2">
<Channel column="sen1" />
<Channel column="sen2" />
</TimeSeries>
</View>
```
> If you're new to Label Studio, [learn](https://labelstud.io/tags/) how you can use tags to set up different labeling interfaces for your data
Depending on where your time series data is coming from it can be formatted very differently. Label Studio provides a way to configure how time series parsing is done so you don't have to transform the original file. Let's start with a simple CSV like that:
```csv
time,sen1,sen2
100,1,23
101,2,34
102,3,45
```
CSV with weirdly formatted datetime, because you've captured that from a weird sensor that doesn't follow the standard:
```csv
time,sen1,sen2
2020-Feb-01 9:30,34.23,272
2020-Feb-01 9:31,251.23,352
2020-Feb-01 9:32,337.124,327
```
In that case, there is `timeFormat` that can handle parsing for you, it uses [strftime](https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes).
The `valueType` controls whether the input is provided as-is, or via a URL. For example, the input file may look like a list of URLs and in that case `valueType="url"` will load the contents of each URL and expect a time series data inside.
```csv
csvURL
http://example.com/path/to/timeseries1.csv
http://example.com/path/to/timeseries2.csv
```
For the headless CSV, you can use a columns index to point to the right columns. For example, using `2` in Channel's `column` attribute would look for the third column (it starts from zero) inside headless CSV.
`timeColumn` is the name of the column with temporal data, notice that you can skip that altogether, and then it generates that for you.
You can also use `timeDisplayFormat` to configure the desired output of the temporal column. It can be a number or a date, if a temporal column is a date then use strftime to format it, otherwise, if it's a number then use [d3 number](https://github.com/d3/d3-format#locale_format) formatting.
## Zoom and Pan
Press `ctrl` key and use your mouse wheel to zoom and pan. If you have a huge time series, then changing the window position and size inside an overview may not let you zoom as much as you like, because it has a certain limit on its width, then you can continue zooming with a mouse wheel
<br/>
<img src="/images/release-080/zoom.gif" class="gif-border" />
## Multivariate and Univariate
There are plenty of ways how you can setup the plots, every defined channel is synchronized with any other channel defined inside the same time series object, giving you a multivariate time series labeling experience. You can also define multiple time series objects and get distinct objects.
<br/>
<img src="/images/release-080/multi-uni.png" />
Use the `Channel` tag to represent each additional time-series channel. By providing multiple channels you get a multivariate labeling interface and can label one channel by looking at the behavior of other items at the same timestamp on another channel.
> `showTracker` attribute on TimeSeries object controls if you see the tracker and holding `shift` key makes it sync between the channels
## Instance labeling and snapping to the point
Double-click to put a bar labeling one particular data point, instead of labeling an entire region. And when you're creating a region it always gets snapped to the closest point.
<br/>
<img src="/images/release-080/instance.png" />
## Configuring overview
By default, an overview is created from the first channel, but you have control over that. Use `overviewChannels` and define what columns are included, it uses the same format as the `column` parameter, and can also use multiple channels inside an overview if you comma separate it.
<br/>
<img src="/images/release-080/overview.png" />
## Synchronizing across data types [experimental]
It's not always the case that you can label time series just by looking at the plots. Different events may have different representations, and in such cases, visual support is required. TimeSeries tag can synchronize to audio or video.
<br/>
<img src="/images/release-080/videosync.png" />
This is an experimental feature right now, and we're working on finalizing the implementation, but if you have use-cases, ping us in [Slack](http://slack.labelstud.io.s3-website-us-east-1.amazonaws.com?source=blog-release), we will help you to set it up.
## Next
Ready to try? [Install Label Studio](/guide/#Running-with-pip) following our guide and check the [template](/templates/time_series.html) on time series configuration. Also, join the Slack channel if you need any help, have feedback, or feature requests.
Cheers!
## Resources
- Label Studio
- [Templates](/templates/time_series.html) - Label Studio pre configured templates for Time Series
- [TimeSeries](/tags/timeseries.html) - Time Series tag specification
- [Channel](/tags/timeseries.html#Channel) - Channel tag specification
- Machine Learning
- https://github.com/awslabs/gluon-ts - Probabilistic time series modeling in Python
- https://github.com/alan-turing-institute/sktime - sktime is a Python machine learning toolbox for time series with a unified interface for multiple learning tasks.
- https://github.com/blue-yonder/tsfresh - Time Series feature extraction package

View File

@ -0,0 +1,37 @@
---
title: Label Studio Release Notes 0.8.0 - Time Series Support
type: blog
order: 99
---
## What problems does Label Studio solve with Time Series Labeling?
Time Series analysis is widely used in medical and robotics areas.
<GIF-with-labeling-demo>
## Quickstart
1. You need to install and run Label Studio (LS) first. It could be done by many ways using [pip](https://labelstud.io/guide/#Running-with-pip)
`pip install label-studio && label-studio start my_project --init`
or using [Docker](https://labelstud.io/guide/#Running-with-Docker), [Github sources](https://labelstud.io/guide/#Running-from-source) and [one-click-deploy](https://github.com/heartexlabs/label-studio#one-click-deploy) button.
2. Open LS in the browser (for local usage it will be [http://localhost:8080](http://localhost:8080) usually).
3. Go to Setup page ([http://localhost:8080/setup](http://localhost:8080/setup)). On this page you need to configure a labeling scheme for your project using LS tags. Read more about LS tags [in the documentation](/tags/timeseries.html). The fastest way to do it is to use templates which are available on Setup page:
<img src="/images/release-080/ts-templates.png" class="gif-border" />
4. Import your CSV/TSV/JSON via Import page ([http://localhost:8080/import](http://localhost:8080/import)).
5. Start Labeling ([http://localhost:8080/](http://localhost:8080/))
## Special cases
### Multiple time series in one labeling config
If you want to use multiple time series tags in one labeling config then you need manually host your CSV files and create JSON with tasks for import which contains links to CSV files. Or you can store time series data in tasks directly.
### Video & audio sync with time series
It's possible to synchronize TimeSeries with video and audio in Label Studio. Right now you can do it using HyperText tag with html objects `<audio src="path">`/`<video src="path">` and TimeSeries together. We have some solutions for this in testing stage and we can share it with you [by request in slack](https://join.slack.com/t/label-studio/shared_invite/zt-qy37y73p-CCfEaEZvDylyQf4oatK40A).

View File

@ -0,0 +1,95 @@
---
title: Label Studio Release Notes 0.9.0
type: blog
image: /images/release-090/improve_your_datasets_with_labeling.jpg
order: 98
meta_title: Label Studio Release Notes 0.9.0
meta_description: Label Studio Release 0.9.0 includes improvements to data management to help you explore and improve your datasets for machine learning projects.
---
<div style="position: relative; padding-bottom: 62.5%; height: 0;"><iframe src="https://www.loom.com/embed/73b5122859d8478ab5ccb03fb6036208" frameborder="0" webkitallowfullscreen mozallowfullscreen allowfullscreen style="position: absolute; top: 0; left: 0; width: 100%; height: 100%;"></iframe></div>
> Above is an intro video by <a href="https://github.com/nicholasrq">Nikita</a>, one of the principal developers behind this update 👆
# Explore and improve your datasets with a new Data Manager
Hi everyone, it's the beginning of January, and we're glad to have you back from the holidays! Over the last couple of weeks weve been finalizing a new release, and are now ready to announce it. The new 0.9.0 release is all about improving experience of working with the dataset.
<br/>
<img src="/images/release-090/aerial_training_data_management.png" />
## Why better data management is important for Machine Learning
Labeling helps us develop better models that are capable of more accurate predictions. But the process can be very tedious. A naive way to label data is to label every sample inside the dataset. However, there are various scenarios where thats not the optimal way to label. For example, if we already have a model, it might be sufficient to only label the examples that the model is least sure about. If the dataset is too large, it can also make sense to label only a representative sample of the dataset.
In other datasets, you want to focus on specific items in the dataset because you have extra knowledge about the feature relationships. For example, finding the words "bad" and "not recommended" can mean that sample text has a negative sentiment. Another example might be to look for and correct misinterpretations of brand names when using speech to text systems. You might want to look only at the images where a pretrained model was not able to identify any objects, so that you can create the missing bounding boxes by hand.
In each of these cases, you want to focus on specific examples of data because you already know how to process them.
## Using filters to narrow down what to label
With the new data manager, you can use filters to narrow down what to label within a dataset. You can find samples with particular features, and when those features are present they can be labeled in a certain way.
Similar to how you might set up a rule in your email client to sort a specific email into a folder, you can use logic and filters to label a data sample.
<br/>
<img src="/images/release-090/data_slice_for_machine_learning_training_data.png" alt="Filter can help focus on specific data slices" />
You can use filters both for the data that youve uploaded as well as annotation and other system data, for example, when the annotation was created.
## Use filters with tabs to review and label data slices
Filters are tab-specific, so you can use tabs to view and filter specific slices of your data. For example, you can use one tab to look at all pre-labeled items from a dataset with a prediction score of less than 0.50, and use another tab to review all the items in the dataset that have not yet been labeled.
<br/>
<img src="/images/release-090/data_slice_switch_for_ml_training_data.gif" class="gif-border" />
## Display data in a list or a grid depending on the data type
You can view the data in the data manager in a list or a grid. If your dataset contains images or other data with a large preview, you might want to view the data in grid format. Otherwise, you can use the list view to see the fields and columns of your data, with each row corresponding to a sample of your data.
<br/>
<img src="/images/release-090/grid_view_for_machine_learning_images.png" />
You can configure which fields appear for each row of data, which can be helpful when reviewing large CSV files.
<br/>
<img src="/images/release-090/control_panel.png" alt="Fields controls" />
### Fields available in the data manager
Label Studio displays fields from both the data and system fields from Label Studio itself. In the list view, the fields are used as columns.
The system fields used by Label Studio are as follows:
- **ID** - An ID is automatically generated for each item uploaded to Label Studio. You can use this ID when using the Label Studio API.
- **Completed** - The date and time that the item was annotated.
- **Completions** - The number of annotations associated with an item.
- **Cancelled** - The number of times that this item was skipped during labeling.
- **Predictions** - The number of predictions for this item.
- **Completions results** - The raw JSON of the annotation results. You can use this to filter by specific labels or classes provided inside an annotation.
- **Predictions score** - The score returned by the model when giving its prediction. You can use this for Active Learning scenarios.
- **Predictions results** - The raw JSON of the prediction object.
The data fields correspond to the dataset that you upload. Example data fields that you might see are “image” for uploaded image files, or column headers from a CSV file. You can also upload metadata inside a CSV file if that metadata might be valuable for labeling the dataset. For example, you might compute the dominant color of images and upload that metadata as part of a CSV file along with the image URL. You could then filter the dataset by dominant color and label the items accordingly. Another quite common scenario is to filter out subset based on any internal database information like ID or DateTime.
### Fields support different data types
Each field in the datamanager can be configured to show different types of data, right now the supported types are: string, datetime, image and audio.
<br/>
<img src="/images/release-090/select_data_type_for_ml.png" />
For example if youre working with speech recognition, you can configure the field to be represented as an audio wave and have access to the audio playback right within the manager.
## Use quick view to explore the dataset
When you click an item inside the data manager, a preview opens. In preview mode you can explore the different items in the dataset. This quick view is specific to the tab youre in and only shows items from that tab.
<br/>
<img src="/images/release-090/quickview_for_bounding_boxes_labeling.png" />
You can label the data in this preview mode, but unlike the labeling stream that you open by clicking the Label button, the preview mode doesnt switch to the next unlabeled item after you submit an annotation.
## Feedback
We made a lot of changes between the 0.8.0 and this 0.9.0 version! Wed love to hear your feedback and more about your experience using Label Studio. You can join our Slack channel or email us at <a href="mailto:hi@labelstud.io">hi@labelstud.io</a>

View File

@ -0,0 +1,96 @@
---
title: Announcing Label Studio v1.0
type: blog
image: /images/release-100/announcement.png
order: 97
meta_title: Label Studio Release Notes 1.0.0
meta_description: Label Studio Release 1.0.0 rewrites Label Studio to support multiple users, projects, and scalable data labeling and annotation for machine learning and data science projects.
---
Hooray! The day is finally here. After almost a year and a half of development, 1000+ commits, 40 releases, and 50+ developers contributing minor and major parts, were happy to announce a big milestone **Label Studio v1.0!**
<br/>
<img src="/images/release-100/title.gif" class="gif-border" />
<center><h3 style="font-style: italic">Weve rebuilt Label Studio from the ground up, thanks to a lot of feedback over the last year. Say hello to a multi-user, multi-project system. Plus its scalable!</h3></center>
## Why use Label Studio?
Label Studio is an open source data labeling tool. Its the most robust and easiest to use solution for labeling a raw dataset or a previously labeled dataset that you want to improve. If you work on computer vision, NLP, conversational AI, audio/speech processing, and time-series projects, you can get up and running in minutes. Improving training data with Label Studio becomes a more transparent and easily-managed process.
<br/>
<img src="/images/release-100/icons.png" class="gif-border" />
<center style="font-style: italic">Various data types you can label using Label Studio</center>
Label Studio also natively integrates with ML models. You can connect your models and keep updating them with new labeled data, as well as perform quality assurance on model predictions.
## Whats new with 1.0
For the last four months, the Label Studio team has been on a nonstop journey of rethinking interfaces, the annotation flow, and the robustness of the overall system.
<br/>
<img src="/images/release-100/open-source-github-pull-request.png" />
<center style="font-style: italic">Here is how this effort looks in terms of development time</center>
As a result, weve re-engineered almost everything! Data labeling is heavily dependent on the simplicity and ease-of-use of the user interface, so we streamlined and updated the entire Label Studio UI. In addition to the UI improvements, we have also improved the speed and performance of working with large datasets. Now you can work efficiently with datasets containing millions of items. Lets dive into the details!
### Data labeling by multiple users
One of the biggest changes weve introduced in this version is the addition of user accounts. Now, multiple users can create accounts in the same Label Studio instance. User's can work off of the same datasets and each user's annotations are tied to their account.
<br/>
<img src="/images/release-100/users.png" />
<center style="font-style: italic">People page is showing the list of users</center>
### Multiple projects to handle all your datasets in one place
Label Studio Projects enable you to create and save labeling configurations for different datasets or projects. Label Studio Projects streamline managing and working on different datasets, can be shared with other users, and can be reused for similar projects in the future.
<br/>
<img src="/images/release-100/projects-list.png" />
We also restructured the project settings and made it easier to configure the labeling interface for each project. A few updates worth mentioning:
#### Model Assisted Labeling
ML models can help pre-label data and optimize the data labeling process. For example, connect a segmentation model like Mask RCNN to give its prediction, then you can adjust the prediction to make it perfect. Another example is if you connect an ASR model to provide speech transcription for further labeling.
<br/>
<img src="/images/release-100/ml-assistance.png" />
<center style="font-style: italic">Adding machine learning model for assisted labeling</center>
#### Read data from cloud storage
If you store your data in the cloud, Label Studio can natively sync with it. Out of the box you can configure Label Studio to read data from AWS S3, GCP, or Microsoft Azure. You can sync from multiple cloud providers or buckets at the same time, and each project can connect to a different cloud storage location.
<br/>
<img src="/images/release-100/cloud-storage-modal.png" />
<center style="font-style: italic">Configuration to read audio files stored in an S3 bucket</center>
#### Interface configuration wizard
You can start labeling quickly by using a template to configure the labeling interface for your project, or enjoy a greater level of flexibility with custom tags.
<br/>
<img src="/images/release-100/wizard.png" />
<center style="font-style: italic">Dozens of the most common data labeling scenarios have templates <GIF showing selecting different templates></center>
### Label large datasets with new scalable data labeling backend
We migrated to a more robust backend from our enterprise version based on Django. We also transitioned from a filesystem-based to SQL-based data storage for tasks and annotations. While a filesystem is probably the simplest approach for storage, it doesnt scale well when you work on datasets with more than 10,000 items. With SQLite as a storage backend, we can now easily upload datasets of hundreds of thousands of items.
<br/>
<img src="/images/release-100/data-manager-filtering.gif" class="gif-border" />
<center style="font-style: italic">Here is filtering performance on dataset with 250K items</center>
> For production deployments, we recommend using PostgreSQL instead of SQLite, especially if you expect to create a large number of users or projects in parallel, because SQLite doesnt support parallel writes.
## Whats next
We hope youre excited to try out this new version of Label Studio!
For the next month we will be focusing on fixing bugs and issues based on your feedback, so wed like to ask you to join our Slack channel. The community is very active and no questions go unanswered and no feedback goes unnoticed.
<a href="http://slack.labelstud.io.s3-website-us-east-1.amazonaws.com?source=blog-release" title="Data labeling community">Join Slack</a>
Next, we'll be releasing an update of our <a href="https://heartex.com/">Label Studio Enterprise</a> and then working on a new version of Label Studio. The focus for that version is more performance improvements and seamless integration into various ML pipelines. See you in Slack!

View File

@ -0,0 +1,97 @@
---
title: Label Studio v1.1 is now available!
type: blog
image: /images/release-110/multi-labeling.gif
order: 93
meta_title: Label Studio Release Notes 1.1.0
meta_description: Release notes and information about Label Studio version 1.1.0, with improved data labeling functionality for image annotations and object character recognition (OCR) labeling for machine learning projects.
---
Label Studio version 1.1 is now available, delivering on our promises in our [public roadmap](https://github.com/heartexlabs/label-studio/blob/master/roadmap.md).
Our main focus for this release was to improve the image annotation experience, whether you're adding shapes, bounding boxes, drawing brush masks, or performing optical character recognition (OCR) with images.
<br/><img src="/images/release-110/label-multiple-regions.gif" alt="Gif of adding polygons and rectangle regions and then labeling them to an aerial image of a city in the Label Studio UI." class="gif-border" width="800px" height="425px" />
Read on for the exciting highlights of this release!
## Performance improvements
We want Label Studio to be faster and more responsive when adding bounding boxes and shapes to images, so this release includes performance optimizations. Now you can add hundreds of bounding boxes to an image without significant user interface delays.
## Quickly create any type of image region
If you want to combine different types of regions in your image annotations, now you can!
Draw whichever shapes or masks that make sense for your images, from rectangles, ellipses, and polygons, to brush masks or keypoints!
You can combine the different types of Control tags in the labeling configuration that you create for the labeling interface, like the following example:
```xml
<View>
<Image name="image" value="$image" />
<Rectangle name="rect" toName="image" />
<Ellipse name="ellipse" toName="image" />
<KeyPoint name="kp" toName="image" />
<Polygon name="polygon" toName="image" />
<Brush name="brush" toName="image" />
<Labels name="labels" toName="image" fillOpacity="0.5" strokeWidth="5">
<Label value="Building" background="green"></Label>
<Label value="Vehicle" background="blue"></Label>
<Label value="Pavement" background="red"></Label>
</Labels>
</View>
```
Then, you can use the multi-tool selector to choose whether to add a rectangle, ellipse, keypoint, polygon, or brush region to the image.
<br/><img src="/images/release-110/multi-labeling.gif" alt="Gif of adding polygons and brush labels to an aerial image of a city in the Label Studio UI." class="gif-border" width="800px" height="519px" />
See more in [Advanced image labeling](/guide/labeling.html#Advanced-image-labeling).
## Quickly create, hide, and remove regions
With this added flexibility in adding regions comes faster labeling! You can now add a rectangle or an ellipse to your image with just two clicks, or double click to create a polygon, rectangle, or ellipse.
If you accidentally select a point on an image while creating a polygon, just double click to remove the erroneous point and continue creating the region. You need to have at least three polygon points to be able to remove one.
<br/><img src="/images/release-110/deletepolygonpoint.gif" alt="Gif of drawing a polygon and removing an accidental point of the polygon on an image in the Label Studio UI." class="gif-border" />
While you could previously show or hide regions one by one, now you can toggle the visibility of all regions at once, or hide all regions for a specific label. This makes it easier for you to create overlapping regions and look at specific labeled regions together.
## Import partial predictions and finish labeling in Label Studio
If you perform data annotation in stages or with different groups of annotators, you might want to separate creating regions with bounding boxes and brushes, from assigning labels to those regions. With Label Studio 1.1, that's now possible!
You can now separate creating regions from assigning labels, which means you can import predicted bounding boxes or polygons from a machine learning model, then correct the placement of the detected objects and finish labeling them in Label Studio. This workflow is perfect for two-step labeling, where you want one annotator, or a machine learning model, to create regions and another annotator to label the regions.
<br/><img src="/images/release-110/label-predicted-regions.gif" alt="Gif of labeling unlabeled rectangular, polygonal, and elliptical regions using the Label Studio UI." class="gif-border" width="800px" height="535px" />
For example, if you have a machine learning model to perform object detection that identifies regions of interest in images, you can upload those predictions to Label Studio and have human annotators apply labels to those regions of interest. If you're doing that with OCR, you can use a machine learning model to identify which regions in an image have text, and then add those predictions to Label Studio and have human annotators transcribe the recognized text.
For more details and example JSON formats, see [Import pre-annotated data into Label Studio](/guide/predictions.html#Import-pre-annotated-regions-for-images). To create regions yourself, see [Advanced image labeling](/guide/labeling.html#Advanced-image-labeling).
## YOLO export support
Label Studio now supports exporting annotations in YOLO labeling format, which is especially helpful for image annotations. Read more in [Export annotations and data from Label Studio](/guide/export.html#YOLO).
## OCR improvements
Beyond the expanded image annotation functionality, we've also improved support for OCR labeling for when you're extracting text from images.
<br/><img src="/images/release-110/OCR-example.gif" alt="Gif of adding recognized text in the sidebar after adding a rectangle bounding box on a receipt for a cotton canvas bag in the Label Studio UI." class="gif-border" width="800px" height="524px" />
Write the text for a selected region in the sidebar, rather than at the bottom of the labeling interface, making it easier to see all the recognized text regions that you've identified and transcribed.
## Stay in touch
Sign up for the [Label Studio Newsletter](https://labelstudio.substack.com/) to find out about new features, tips for using Label Studio, and information about machine learning research and best practices.

112
docs/source/blog/styles.css Normal file
View File

@ -0,0 +1,112 @@
h1 {
margin-top: 2.5em !important;
}
.content {
max-width: 1200px !important;
width: unset !important;
margin: 60px auto 50px auto;
padding: 0;
}
.blog-body {
margin-bottom: 100px;
}
.grid {
display: -webkit-box;
display: -ms-flexbox;
display: flex;
-webkit-box-orient: horizontal;
-webkit-box-direction: normal;
-ms-flex-direction: row;
flex-direction: row;
-ms-flex-wrap: wrap;
flex-wrap: wrap;
-webkit-box-align: stretch;
-ms-flex-align: stretch;
align-items: stretch;
padding: 0;
}
.column {
width: 50% !important;
}
.highlight {
border: 2px solid rgba(244, 138, 66, 0.75);
}
.card {
margin: 2em 1em;
}
.card .image-wrap {
transition: linear 0.25s;
border-radius: 7px;
box-shadow: 0 0 2px rgba(0, 0, 0, 0.3);
padding: 5px;
opacity: 0.8;
}
.card .image-wrap:hover {
opacity: 1;
box-shadow: 0 0 5px rgba(0, 0, 0, 0.3);
transition: linear 0.25s;
}
.card .image-wrap .image {
margin: 0 auto;
width: 95%;
height: 250px;
background-size: contain;
background-repeat: no-repeat;
background-position: center center;
}
.card .category {
cursor: pointer;
display: inline-block;
color: green;
margin-top: 18px;
font-size: 80%;
font-weight: 500;
letter-spacing: .08em;
text-transform: uppercase;
}
.card .title {
margin-top: 0.5em;
font-size: 130%;
font-weight: bold;
color: #555;
}
.card .desc {
float: right;
margin-top: 18px;
font-size: 80%;
font-weight: normal;
color: #777;
}
.sidebar {
display: none;
}
@media screen and (max-width: 900px) {
.sidebar {
display: flex;
}
@media only screen and (max-width: 768px) {
.grid {
width: auto;
margin-left: 0 !important;
margin-right: 0 !important;
}
.column {
width: 100% !important;
margin: 0 0 !important;
-webkit-box-shadow: none !important;
box-shadow: none !important;
padding: 1rem 1rem !important;
}
}
}

96
docs/source/guide/FAQ.md Normal file
View File

@ -0,0 +1,96 @@
---
title: Troubleshoot Label Studio
short: Troubleshooting
type: guide
order: 204
meta_title: Troubleshoot Label Studio
meta_description: Troubleshoot common issues with Label Studio configuration and performance so that you can return to your machine learning and data science projects.
---
If you encounter an issue using Label Studio, use this page to troubleshoot it.
## Blank page when loading a project
After starting Label Studio and opening a project, you see a blank page. Several possible issues could be the cause.
### Cause: Host not recognized
If you specify a host without a protocol such as `http://` or `https://` when starting Label Studio, Label Studio can fail to locate the correct files to load the project page.
To resolve this issue, update the host specified as an environment variable or when starting Label Studio. See [Start Label Studio](start.html)
## Slowness while labeling
If you're using the SQLite database and another user imports a large volume of data, labeling might slow down for other users on the server due to the database load.
If you want to upload a large volume of data (thousands of items), consider doing that at a time when people are not labeling or use a different database backend such as PostgreSQL or Redis. You can run Docker Compose from the root directory of Label Studio to use PostgreSQL: `docker-compose up -d`, or see [Sync data from cloud or database storage](storage.html).
## Image/audio/resource loading error while labeling
The most common mistake while resource loading is <b>CORS</b> (Cross-Origin Resource Sharing) problem or Cross Domain. When you are trying to fetch a picture from external hosting it could be blocked by security reasons. Go to browser console (Ctrl + Shift + i for Chrome) and check errors there. Typically, this problem is solved by the external host setup.
<br>
<center>
<img src='../images/cors-lsf-error.png' style="max-width:300px; width: 100%; opacity: 0.8">
<br/><br/>
<img src='/images/cors-error.png' style="max-width:500px; width: 100%; opacity: 0.8">
<br/><br/>
<img src='/images/cors-error-2.png' style="max-width:500px; width: 100%; opacity: 0.8">
</center>
- If you have access to the hosting server as admin then you need to allow CORS for the web server. For example, on nginx, you can try to add <a href="javascript:void(0)" onclick="$('#nginx-cors-code').toggle()">these lines</a> to `/etc/nginx/nginx.conf` into your `location` section:
```
location <YOUR_LOCATION> {
if ($request_method = 'OPTIONS') {
add_header 'Access-Control-Allow-Origin' '*';
add_header 'Access-Control-Allow-Methods' 'GET, POST, OPTIONS';
#
# Custom headers and headers various browsers *should* be OK with but aren't
#
add_header 'Access-Control-Allow-Headers' 'DNT,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type,Range';
#
# Tell client that this pre-flight info is valid for 20 days
#
add_header 'Access-Control-Max-Age' 1728000;
add_header 'Content-Type' 'text/plain; charset=utf-8';
add_header 'Content-Length' 0;
return 204;
}
if ($request_method = 'POST') {
add_header 'Access-Control-Allow-Origin' '*';
add_header 'Access-Control-Allow-Methods' 'GET, POST, OPTIONS';
add_header 'Access-Control-Allow-Headers' 'DNT,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type,Range';
add_header 'Access-Control-Expose-Headers' 'Content-Length,Content-Range';
}
if ($request_method = 'GET') {
add_header 'Access-Control-Allow-Origin' '*';
add_header 'Access-Control-Allow-Methods' 'GET, POST, OPTIONS';
add_header 'Access-Control-Allow-Headers' 'DNT,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type,Range';
add_header 'Access-Control-Expose-Headers' 'Content-Length,Content-Range';
}
}
```
- If you use Amazon S3 with Label Studio, see [Troubleshoot CORS and access problems](storage.html#Troubleshoot-CORS-and-access-problems).
- If you use Google Cloud Storage with Label Studio, see [Troubleshoot CORS and access problems](storage.html#Troubleshoot-CORS-and-access-problems).
- If you serve your data from an HTTP server created like follows: `python -m http.server 8081 -d`, run the following from the command line:
```bash
npm install http-server -g
http-server -p 3000 --cors
```
Not every host supports CORS setup, but you can to try locate CORS settings in the admin area of your host configuration.
<br/>
## Audio wave doesn't match annotations
If you find that after annotating audio data, the visible audio wave doesn't match the timestamps and the sound, try converting the audio to a different format. For example, if you are annotating mp3 files, try converting them to wav files.
```bash
ffmpeg -y -i audio.mp3 -ar 8k -ac 1 audio.wav
```

224
docs/source/guide/api.md Normal file
View File

@ -0,0 +1,224 @@
---
title: Backend API
type: guide
order: 907
meta_title: API Endpoints
meta_description: API documentation for authenticating, listing data science projects, importing predictions and raw data and exporting annotated data, and user management in Label Studio.
---
## API reference for Label Studio 1.0.0
You can use the Label Studio API to import data for labeling, export annotations, set up machine learning with Label Studio, and sync tasks with cloud storage.
See the [API reference documentation](/api) for further guidance and interactive examples.
### Authenticate to the API
You must retrieve your access token so that you can authenticate to the API.
1. In the Label Studio UI, click the user icon in the upper right.
2. Click **Account & Settings**.
3. Copy the access token.
In your first API call, specify the access token in the headers:
```bash
curl -X <method> <Label Studio URL>/api/<endpoint> -H 'Authorization: Token <token>'
```
You can also retrieve the access token using the command line.
1. From the command line, run the following:
```bash
label-studio user --username <username>
```
2. In the output returned in your terminal, the token for the user is listed as part of the user info.
See [API documentation for authentication](/api#section/Authentication).
### List all projects
To perform most tasks with the Label Studio API, you must specify the project ID, sometimes referred to as the `pk`. If you don't know what your project ID is, you might want to get a list of all projects in Label Studio that you can access. See the [List your projects API endpoint documentation](/api#operation/api_projects_list).
### Create and set up a project
Create a project and set up the labeling interface in Label Studio using the API. See the [Create new project API endpoint documentation](/api#operation/projects_create).
If you want to make sure the configuration for your labeling interface is valid before submitting it using the API, you can use the [validate label config](/api#operation/projects_validate_create) API endpoint.
### Import tasks using the API
To import tasks using the API, make sure you know the project ID that you want to add tasks to. See additional examples and parameter descriptions in the [import data endpoint documentation](/api#operation/projects_import_create)
### Retrieve tasks
Retrieve a paginated list of tasks for a specific project. If you want, you can also retrieve tasks and annotations using this API endpoint, as an alternative to exporting annotations. See details and parameters in the [list project tasks endpoint documentation](/api#operation/projects_tasks_list).
### Export annotations
To export annotations, first see [which formats are available to export for your project](/api#operation/api_projects_export_formats_read).
Choose your selected format from the response and then call the export endpoint. See the [export annotations](/api#operation/projects_export_list) endpoint documentation for more details.
### API endpoint reference for older Label Studio versions
These API endpoints were introduced in Label Studio version 0.8.1 and are only valid until version 0.9.1. Use the [API documentation](/api) linked inside Label Studio and for guidance when working with version 1.0.0.
### Set up project configuration
> These API endpoints were introduced in Label Studio version 0.8.1 and are only valid until version 0.9.1.
`POST /api/project/config`
Save labeling config for a project using API:
```
curl -X POST -H Content-Type:application/json http://localhost:8080/api/project/config \
--data "{\"label_config\": \"<View>[...]</View>\"}"
```
Or by reading from a local config.xml file:
```
curl -X POST -H Content-Type:application/xml http://localhost:8080/api/project/config \
--data @config.xml
```
The backend returns status 201 if the config is valid and saved.
If errors occur, the backend returns status 400 and the response body is a JSON dict like the following:
```
{
"label_config": ["error 1 description", " error 2 description", ...]
}
```
### Import data, files and tasks
> These API endpoints were introduced in Label Studio version 0.8.1 and are only valid until version 0.9.1.
`POST /api/project/import`
Use the API to import tasks in [Label Studio basic format](tasks.html#Basic-format), which can be useful when you are creating a data stream.
```bash
curl -X POST -H Content-Type:application/json http://localhost:8080/api/project/import \
--data "[{\"my_key\": \"my_value_1\"}, {\"my_key\": \"my_value_2\"}]"
```
Or you can import a file and make a Label Studio task automatically:
```bash
curl -X POST -F "FileUpload=@test.jpg" http://localhost:8080/api/project/import
```
### Retrieve project
> These API endpoints were introduced in Label Studio version 0.8.1 and are only valid until version 0.9.1.
`GET /api/project`
You can retrieve project settings, including the total task count, using the API in JSON format:
```json
curl http://localhost:8080/api/project/
```
Response example:
```json
{
...
"task_count": 3,
...
}
```
### Retrieve tasks
> These API endpoints were introduced in Label Studio version 0.8.1 and are only valid until version 0.9.1.
`GET /api/tasks`
To get tasks with pagination in JSON format:
```
curl http://localhost:8080/api/tasks?page=1&page_size=10&order=-completed_at
```
Response example:
```json
[
{
"completed_at": "2020-05-29 03:31:15",
"completions": [
{
"created_at": 1590712275,
"id": 10001,
"lead_time": 4.0,
"result": [ ... ]
}
],
"data": {
"image": "s3://htx-dev/dataset/training_set/dogs/dog.102.jpg"
},
"id": 2,
"predictions": []
}
]
```
`order` can be either one of `id`, `-id`, `completed_at`, `-completed_at`
### Export annotations
> These API endpoints were introduced in Label Studio version 0.8.1 and are only valid until version 0.9.1.
`GET /api/project/export`
You can use the API to request a file with all annotations, for example:
```bash
curl http://localhost:8080/api/project/export?format=JSON > exported_results.json
```
The format descriptions are presented [in the export documentation](export.html).
The `format` parameters can be found on the Export page in the dropdown (`JSON`, `JSON_MIN`, `COCO`, `VOC`, etc).
### Health check for Label Studio
Label Studio has a special endpoint to run health checks:
```bash
GET /api/health
```
### Reference
Label Studio API endpoint reference.
> These API endpoints were introduced in Label Studio version 0.8.1 and are only valid until version 0.9.1.
| URL | Description |
| --- | --- |
| **Project** |
| /api/project | `GET` return project settings and states (like task and completion counters) <br> `POST` create a new project for multi-session-mode with `desc` field from request args as project title <br> `PATCH` update project settings |
| /api/project/config | `POST` save project configuration (labeling config, etc) |
| /api/project/import | `POST` import data or annotations |
| /api/project/export | `GET` downalod annotations, pass `format` param to specify the format |
| /api/project/next | `GET` return next task available for labeling |
| /api/project/storage-settings | `GET` current storage settings <br> `POST` set storage settings |
| /api/project-switch | `GET` switch to specified project by project UUID in multi-session mode |
| **Tasks** |
| /api/tasks | `GET` retrieve all tasks from project <br> `DELETE` delete all tasks from project |
| /api/tasks/\<task_id> | `GET` retrieve specific task <br> `DELETE` delete specific task <br> `PATCH \| POST` rewrite task with data, completions and predictions (it's very helpful for changing data in time and prediction updates) |
| /api/tasks/\<task_id>/completions | `POST` create a new completion <br> `DELETE` delete all task completions |
| /api/tasks/\<task_id>/completions/\<completion_id> | `PATCH` update completion <br> `DELETE` delete completion |
| /api/completions | `GET` returns all completion ids <br> `DELETE` delete all project completions |
| **Machine Learning Models** |
| /api/models | `GET` list all models <br> `DELETE` remove model with `name` field from request json body |
| /api/models/train | `POST` send training signal to ML backend |
| /api/models/predictions?mode={data\|all_tasks\|specific_tasks} | `GET \| POST`<br> `mode=data`: generate ML model predictions for one task from `data` field of request json body<br> `mode=all_tasks`: generate ML model predictions for all LS DB tasks <br> `mode=specific_tasks`: generate predictions for tasks specified in "task_ids" JSON data or in path arguments, e.g.: <nobr><i>?mode=specific_tasks&task_ids=1,2,3</i></nobr> |
| **Helpers** |
| /api/validate-config | `POST` check labeling config for errors |
| /api/import-example | `GET \| POST` generate example for import by given labeling config |
| /api/import-example-file | `GET` generate example file for import using current project labeling config |
| /api/health | `GET` health check |
| /version | `GET` current Label Studio Backend and Frontend versions |

View File

@ -0,0 +1,186 @@
---
title: Update scripts and API calls in Label Studio Enterprise
short: Update scripts and API calls
badge: <i class='ent'></i>
type: guide
order: 910
meta_title: Update scripts and API calls to new version
meta_description: Label Studio Enterprise documentation about updates and changes to the API endpoints in version 2.0.
---
With the new version of Label Studio Enterprise, you must update your scripts and API calls to match new API endpoints and formats. Some endpoints are new, some arguments for existing endpoints are deprecated and removed, and some payloads have changed for POST requests.
> Throughout the new version, `completions` have been renamed `annotations`. In addition, "Teams" are now called "Workspaces", to better reflect the fact that they are a way to organize projects, rather than people.
> If you rely on existing object IDs (like project_id, task_id, annotation_id, etc.), these were likely changed due to database migration.
## Import data
One endpoint has been deprecated and the payload and response have small updates.
### Update bulk task import calls
The endpoint `/api/projects/<int:project_id>/tasks/bulk` still works, but is deprecated and will be removed in the future.
Update calls to that endpoint to use the `/api/projects/<project_ID>/import` endpoint instead. See the [import task data API endpoint documentation](/api#operation/api_projects_import_create).
### Update task import payload for pre-annotations
When you import pre-annotations into Label Studio, the `completions` field is now `annotations`. See the [import task data API endpoint documentation](/api#operation/api_projects_import_create).
### Changes to the endpoint response
The endpoint response returns an `annotation_count` field instead of a `completion_count` field. If your script expects a response with this field, update it to expect a response with the new field.
## Export data
The export endpoint has changed, and so have the available options for that endpoint and the response parameters.
### Updated export endpoint
To export annotations from Label Studio Enterprise, you must call a new endpoint.
Requests made to `/api/projects/<project_ID>/results` fail. Instead, call `/api/projects/<project_ID>/export?exportType=JSON`. See the [export API endpoint documentation](/api#operation/api_projects_export_read).
With this change, several arguments are no longer supported:
| Deprecated argument | New behavior |
| --- | --- |
| `?aggregator_type=` | Cannot export aggregated annotations. |
| `?finished=0` | No longer a default setting. Instead, use `download_all_tasks=true`. |
| `?return_task=1` | Endpoint always returns all tasks. |
| `?return_predictions=1` | Endpoint always returns predictions. |
| `?return_ground_truths=1` | Endpoint always returns ground truth annotations. |
### Changes to the endpoint response
In the endpoint response when exporting annotations, the `"completions":` section is renamed to `"annotations"`.
The content of the response also has some changes:
- `aggregated` is removed
- `aggregated_completed_by` is removed
- `aggregated_ids` is removed
- `ground_truth` is removed
- `result` is no longer a double list `"result": [[... ]]` and is now a single list `“result”: [...] `
- `completed_by` IDs now refer to the actual user IDs (not "expert" IDs as before)
#### Previous version response
```json
"completions": [
{
"aggregated": true,
"aggregated_completed_by": [
103
],
"aggregated_ids": [
800955
],
"ground_truth": false,
"result": [
[
{
"id": "7tHQ-n6xfo",
"type": "choices",
"value": {
"choices": [
"Neutral"
]
},
"to_name": "text",
"from_name": "sentiment"
}
]
]
}
]
```
#### Current version response
```json
"annotations": [
{
"result": [
{
"id": "7tHQ-n6xfo",
"type": "choices",
"value": {
"choices": [
"Neutral"
]
},
"to_name": "text",
"from_name": "sentiment"
}
]
}
]
```
## Invite people and create accounts
When you invite people to join your organization, workspace (formerly team), and projects, there are many changes.
### Update tokens in use
The old tokens are no longer supported. Make a request reset your organization token used to create invitation links.
See the [reset organization token API endpoint documentation](/api#operation/api_invite_reset-token_create).
### Updated URL to invite people to your organization
The URL that you use to invite people to your Label Studio Enterprise organization has changed from `https://app.heartex.ai/organization/welcome/<token>` to `http://localhost:8000/user/signup/?token=<token>`.
You can generate the token for that URL using the [organization invite link API endpoint documentation](/api#operation/api_invite_list), then using the response to create the invitation URL.
For example, with the example response:
```json
{
"token": "111a2b3333cd444e",
"invite_url": "/user/signup/?token=111a2b3333cd444e"
}
```
Create an invitation URL of `http://localhost:8000/user/signup/?token=111a2b3333cd444e`.
### Updated flow for inviting project members
Links that invite people directly to projects are no longer supported. Instead, perform the following steps in order:
1. Invite the person to the organization with the [link returned by the organization invite API endpoint](/api#operation/api_invite_list).
2. Change the user's role to annotator with the [organization membership role endpoint](/api#operation/api_organizations_memberships_partial_update).
3. Add the user to the project by making a [POST request to the project member endpoint](/api#operation/api_projects_members_create).
## Create and update external and cloud storage
Some endpoints have been updated and some payload parameters are different when performing actions with the storage API.
### Updates to storage endpoints
When you want to retrieve information about a storage configuration, specify the type of storage in the API endpoint.
Instead of `/api/storages/<int:pk>/`, call `api/storages/s3?project=<project_ID>` for Amazon S3 storage connections for the specific project. See the API documentation to [get Amazon S3 storage](/api#operation/api_storages_s3_list). You can also call `api/storages/s3/<storage_ID>` to get the details of a specific storage connection. See the API documentation to [get a specific Amazon S3 storage connection](/api#operation/api_storages_s3_read).
The same change applies when syncing storage.
Instead of `/api/storages/<int:pk>/sync/`, call `/api/storages/s3/<project_ID>/sync` to [sync Amazon S3 storage](/api#operation/api_storages_s3_sync_create).
### Updates to creating or listing storage payload parameters
Some parameters have changed as listed in the following table:
| Previous parameter | New parameter |
| --- | --- |
| path | bucket |
| regex | regex_filter |
| data_key | None. Instead, BLOBs are attached to the first available object tag |
## Create projects
With the change from teams to workspaces, the `team_id` parameter is no longer supported in the POST payload to create a project using the API.
Instead, do the following:
1. [Create a project](/api#operation/api_projects_create).
2. [Add the project to a workspace](/api_workspaces_projects_create).
3. [Add the project to a workspace](/api#operation/api_workspaces_post).

View File

@ -0,0 +1,62 @@
---
title: Set up authentication for Label Studio
short: Set up authentication
badge: <i class='ent'></i>
type: guide
order: 221
meta_title: Authentication for Label Studio Enterprise
meta_description: Label Studio Enterprise documentation for setting up SSO and LDAP authentication for your data labeling, machine learning, and data science projects.
---
> Beta documentation: Label Studio Enterprise v2.0.0 is currently in Beta. As a result, this documentation might not reflect the current functionality of the product.
Set up single sign-on using SAML to manage access to Label Studio using your existing Identity Provider (IdP), or use LDAP authentication.
<div class="enterprise"><p>
SSO and LDAP authentication are only available in Label Studio Enterprise Edition. If you're using Label Studio Community Edition, see <a href="label_studio_compare.html">Label Studio Features</a> to learn more.
</p></div>
## Set up SAML SSO
The organization owner for Label Studio Enterprise can set up SSO & SAML for the instance. Label Studio Enterprise supports the following IdPs:
- Microsoft Active Directory
- OneLogin
- others that use SAML assertions
After you set up SSO, you can no longer use native authentication to access the Label Studio UI unless you have the Owner role.
## Set up LDAP authentication
After you set up LDAP authentication, you can no longer use native authentication to access the Label Studio UI unless you have the Owner role.
Set up LDAP authentication and assign LDAP users to your Label Studio Enterprise organization using environment variables in Docker.
You can refer to this example environment variable file for your own LDAP setup:
```
AUTH_LDAP_ENABLED=1
AUTH_LDAP_SERVER_URI=ldap://www.example.com
AUTH_LDAP_BIND_DN=cn=ro_admin,ou=sysadmins,dc=zexample,dc=com
AUTH_LDAP_BIND_PASSWORD=zexamplepass
AUTH_LDAP_USER_DN_TEMPLATE=uid=%(user)s,ou=users,ou=guests,dc=zexample,dc=com
# Group parameters
AUTH_LDAP_GROUP_SEARCH_BASE_DN=ou=users,ou=guests,dc=zexample,dc=com
AUTH_LDAP_GROUP_SEARCH_FILTER_STR=(objectClass=groupOfNames)
AUTH_LDAP_GROUP_TYPE=ou
# Populate the user from the LDAP directory, values below are set by default
AUTH_LDAP_USER_ATTR_MAP_FIRST_NAME=givenName
AUTH_LDAP_USER_ATTR_MAP_LAST_NAME=sn
AUTH_LDAP_USER_ATTR_MAP_EMAIL=mail
# Specifity organization to assign on the platform
AUTH_LDAP_ORGANIZATION_OWNER_EMAIL=heartex@heartex.net
# Advanced options, read more about options and values here:
# https://www.python-ldap.org/en/latest/reference/ldap.html#options
AUTH_LDAP_CONNECTION_OPTIONS=OPT_X_TLS_CACERTFILE=/certificates/ca.crt;OPT_X_TLS_REQUIRE_CERT=OPT_X_TLS_DEMAND
```
After setting up LDAP authentication for your on-premises Label Studio Enterprise instance, you can use the credentials `guest1` and `guest1password` to log in and test the setup.

213
docs/source/guide/export.md Normal file
View File

@ -0,0 +1,213 @@
---
title: Export annotations and data from Label Studio
short: Export annotations
type: guide
order: 415
meta_title: Export Annotations
meta_description: Label Studio documentation for exporting data labeling annotations in multiple formats that you can use in machine learning models and data science projects.
---
## Export data from Label Studio
Export your completed annotations from Label Studio. Label Studio stores your annotations in a raw JSON format in the SQLite database backend or whichever cloud or database storage you specify as target storage. Cloud storage buckets contain one file per labeled task named as `task_id.json`.
You can convert the raw JSON completed annotations stored by Label Studio into a more common format and export that data in several different ways:
- Export from the Label Studio UI on the [/export](http://localhost:8080/export) page.
- Call the API to export data. See the Label Studio [API documentation](api.html).
- For versions of Label Studio earlier than 1.0.0, run the relevant [converter tool](https://github.com/heartexlabs/label-studio-converter) on the directory of completed annotations using the command line or Python. You can also run the relevant converter tool on exported JSON from version 1.0.0.
## Export formats supported by Label Studio
Label Studio supports many common and standard formats for exporting completed labeling tasks. If you don't see a format that works for you, you can contribute one. See the [GitHub repository for the Label Studio Converter tool](https://github.com/heartexlabs/label-studio-converter).
### JSON
List of items in [raw JSON format](#Label-Studio-JSON-format-of-annotated-tasks) stored in one JSON file. Use to export both the data and the annotations for a dataset.
### JSON_MIN
List of items where only `"from_name", "to_name"` values from the [raw JSON format](#Label-Studio-JSON-format-of-annotated-tasks) are exported. Use to export only the annotations and the data for a dataset, and no Label-Studio-specific fields.
```json
{
"image": "https://htx-misc.s3.amazonaws.com/opensource/label-studio/examples/images/nick-owuor-astro-nic-visuals-wDifg5xc9Z4-unsplash.jpg",
"tag": [{
"height": 10.458911419423693,
"rectanglelabels": [
"Moonwalker"
],
"rotation": 0,
"width": 12.4,
"x": 50.8,
"y": 5.869797225186766
}]
}
```
### CSV
Results are stored as comma-separated values with the column names specified by the values of the `"from_name"` and `"to_name"` fields.
### TSV
Results are stored in tab-separated tabular file with column names specified by `"from_name"` `"to_name"` values
### CONLL2003
Popular format used for the [CoNLL-2003 named entity recognition challenge](https://www.clips.uantwerpen.be/conll2003/ner/).
### COCO
Popular machine learning format used by the [COCO dataset](http://cocodataset.org/#home) for object detection and image segmentation tasks.
### Pascal VOC XML
Popular XML-formatted task data used for object detection and image segmentation tasks.
### Brush labels to NumPy & PNG
Export your brush labels as NumPy 2d arrays and PNG images. Each label outputs as one image.
### ASR_MANIFEST
Export audio transcription labels for automatic speech recognition as the JSON manifest format expected by [NVIDIA NeMo models](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/v0.11.0/collections/nemo_asr.html).
```json
{“audio_filepath”: “/path/to/audio.wav”, “text”: “the transcription”, “offset”: 301.75, “duration”: 0.82, “utt”: “utterance_id”, “ctm_utt”: “en_4156”, “side”: “A”}
```
### YOLO
Export object detection annotations in the YOLOv3 format. You must have at least one `RectangleLabels` or similar object detection annotations present in your labeling configuration to use the YOLO export format.
## Label Studio JSON format of annotated tasks
When you annotate data, Label Studio stores the output in JSON format. The raw JSON structure of each completed task follows this example:
```json
{
"id": 1,
"data": {
"image": "https://example.com/opensource/label-studio/examples/images/nick-owuor-astro-nic-visuals-wDifg5xc9Z4-unsplash.jpg"
},
"created_at":"2021-03-09T21:52:49.513742Z",
"updated_at":"2021-03-09T22:16:08.746926Z",
"project":83,
"annotations": [
{
"id": "1001",
"result": [
{
"from_name": "tag",
"id": "Dx_aB91ISN",
"source": "$image",
"to_name": "img",
"type": "rectanglelabels",
"value": {
"height": 10.458911419423693,
"rectanglelabels": [
"Moonwalker"
],
"rotation": 0,
"width": 12.4,
"x": 50.8,
"y": 5.869797225186766
}
}
]
"was_cancelled":false,
"ground_truth":false,
"created_at":"2021-03-09T22:16:08.728353Z",
"updated_at":"2021-03-09T22:16:08.728378Z",
"lead_time":4.288,
"result_count":0,
"task":1,
"completed_by":10
}
],
"predictions": [
{
"created_ago": "3 hours",
"model_version": "model 1",
"result": [
{
"from_name": "tag",
"id": "t5sp3TyXPo",
"source": "$image",
"to_name": "img",
"type": "rectanglelabels",
"value": {
"height": 11.612284069097889,
"rectanglelabels": [
"Moonwalker"
],
"rotation": 0,
"width": 39.6,
"x": 13.2,
"y": 34.702495201535505
}
}
]
},
{
"created_ago": "4 hours",
"model_version": "model 2",
"result": [
{
"from_name": "tag",
"id": "t5sp3TyXPo",
"source": "$image",
"to_name": "img",
"type": "rectanglelabels",
"value": {
"height": 33.61228406909789,
"rectanglelabels": [
"Moonwalker"
],
"rotation": 0,
"width": 39.6,
"x": 13.2,
"y": 54.702495201535505
}
}
]
}
]
}
```
### Relevant JSON property descriptions
Review the full list of JSON properties in the [API documentation](api.html).
| JSON property name | Description |
| --- | --- |
| id | Identifier for the labeling task from the dataset. |
| data | Data copied from the input data task format. See the documentation for [Task Format](tasks.html#Basic-Label-Studio-JSON-format). |
| project | Identifier for a specific project in Label Studio. |
| annotations | Array containing the labeling results for the task. |
| annotations.id | Identifier for the completed task. |
| annotations.lead_time | Time in seconds to label the task. |
| annotations.result | Array containing the results of the labeling or annotation task. |
| annotations.completed_by | User ID of the user that created the annotation. Matches the list order of users on the
People page on the Label Studio UI. |
| result.id | Identifier for the specific annotation result for this task.|
| result.from_name | Name of the tag used to label the region. See [control tags](/tags). |
| result.to_name | Name of the object tag that provided the region to be labeled. See [object tags](/tags). |
| result.type | Type of tag used to annotate the task. |
| result.value | Tag-specific value that includes details of the result of labeling the task. The value structure depends on the tag for the label. [Explore each tag](/tags) for more details. |
| predictions | Array of machine learning predictions. Follows the same format as the completions array, with one additional parameter. |
| predictions.score | The overall score of the result, based on the probabilistic output, confidence level, or other. |
<!-- md image_units.md -->
<!-- md annotation_ids.md -->

View File

@ -0,0 +1,154 @@
---
title: Frontend library
type: guide
order: 705
meta_title: Customize User Interface
meta_description: Label Studio documentation for integrating the Label Studio frontend interface into your own machine learning or data labeling application workflow.
---
The [Label Studio Frontend](https://github.com/heartexlabs/label-studio-frontend) (LSF) is the frontend library for Label Studio, based on React and mobx-state-tree and distributed as an NPM package. You can include it in your applications without using the Label Studio Backend (LSB) to provide data annotation support to your users. You can customize and extend the frontend library.
LSF is located as a separate GitHub repository:
https://github.com/heartexlabs/label-studio-frontend
<br>
<div style="margin:auto; text-align:center;"><img src="/images/LSF-modules.png" style="opacity: 0.9"/></div>
## Frontend development
Refer to the [Frontend reference guide](frontend_reference.html) when developing with Label Studio Frontend.
### Manual builds
If you want to build a new tag or change the behaviour of default components inside of LSF, then you need to go into the LSF repo and review the [Development part](https://github.com/heartexlabs/label-studio-frontend#development) of the README file. Making any changes requires that you have a good knowledge of React and Javascript.build.js <branch-name-from-official-lsf-repo>
### GitHub Artifacts
Use GitHub Artifacts to download a zip-formatted archive with LSF builds. Branches from the official LSF repo are built automatically and hosted on GitHub Artifacts.
See the [GitHub Actions for the LSF repository](https://github.com/heartexlabs/label-studio-frontend/actions) to access them.
You can also configure a GitHub token to obtain artifacts automatically:
```
export GITHUB_TOKEN=<token>
cd label-studio/scripts
node get-lsf-build.js <branch-name-from-official-lsf-repo>
```
### CDN
You can include `main.<hash>.css` and `main.<hash>.js` files from a CDN directly. Explore `https://unpkg.com/label-studio@<LS_version>/build/static/` (e.g. [0.7.3](https://unpkg.com/label-studio@0.7.3/build/static/) to find correct filenames of js/css.
```xhtml
<!-- Theme included stylesheets -->
<link href="https://unpkg.com/label-studio@0.7.3/build/static/css/main.14acfaa5.css" rel="stylesheet">
<!-- Main Label Studio library -->
<script src="https://unpkg.com/label-studio@0.7.3/build/static/js/main.0249ea16.js"></script>
```
## Frontend integration guide
You can use the Label Studio Frontend separately in your own projects by including it in your HTML page. Instantiate a new Label Studio object with a selector for the div that should become the editor.
To see all the available options for the initialization of LabelStudio object, see the [Label Studio Frontend](frontend_reference.html).
``` xhtml
<!-- Include Label Studio stylesheet -->
<link href="https://unpkg.com/label-studio@0.7.3/build/static/css/main.09b8161e.css" rel="stylesheet">
<!-- Create the Label Studio container -->
<div id="label-studio"></div>
<!-- Include the Label Studio library -->
<script src="https://unpkg.com/label-studio@0.7.3/build/static/js/main.e963e015.js"></script>
<!-- Initialize Label Studio -->
<script>
var labelStudio = new LabelStudio('label-studio', {
config: `
<View>
<Image name="img" value="$image"></Image>
<RectangleLabels name="tag" toName="img">
<Label value="Hello"></Label>
<Label value="World"></Label>
</RectangleLabels>
</View>
`,
interfaces: [
"panel",
"update",
"controls",
"side-column",
"annotations:menu",
"annotations:add-new",
"annotations:delete",
"predictions:menu"
],
user: {
pk: 1,
firstName: "James",
lastName: "Dean"
},
task: {
annotations: [],
predictions: [],
id: 1,
data: {
image: "https://htx-misc.s3.amazonaws.com/opensource/label-studio/examples/images/nick-owuor-astro-nic-visuals-wDifg5xc9Z4-unsplash.jpg"
}
},
onLabelStudioLoad: function(LS) {
var c = LS.annotationStore.addAnnotation({
userGenerate: true
});
LS.annotationStore.selectAnnotation(c.id);
},
onSubmitAnnotation: function(LS, annotation) {
// retrive an annotation
console.log(annotation.serializeAnnotation())
}
});
</script>
```
## Custom LSF + LSB integration
LS Frontend (LSF) with Backend (LSB) integration is similar what is described in the [Frontend integration guide](#Frontend-integration-guide). The Javascript integration script is placed in [lsf-sdk.js](https://github.com/heartexlabs/label-studio/blob/master/label_studio/static/js/lsf-sdk.js) in the Label Studio Backend. The main idea of this integration based on LSF callbacks.
1. Make your custom LSF build by following these [instructions](https://github.com/heartexlabs/label-studio-frontend#development). Finalize your development with `npm run build-bundle` to generate `main.<hash>.css` and `main.<hash>.js` files.
2. **Do not forget** to remove the old build from LSB:
```bash
rm -r label-studio/label_studio/static/editor/*
```
3. Copy build folder from LSF to LSB:
```bash
cp -r label-studio-frontend/build/static/{js,css} label-studio/label_studio/static/editor/
```
If you installed LS as a pip package, replace `<env-path>/lib/python<version>/site-packages/label_studio/static/editor/`
4. Run the LS instance as usual and it uses the new LSF build:
```bash
label-studio start <your-project>
```
Check for the new build by exploring the source code of the Labeling page in your browser. There must be something like this in the `<head>` section:
```xhtml
<!-- Editor CSS -->
<link href="static/editor/css/main.b50aa47e.css" rel="stylesheet">
<!-- Editor JS -->
<script src="static/editor/js/main.df658436.js"></script>
```
If you have duplicate css/js files, then you must repeat these instruction from step 2.

View File

@ -0,0 +1,272 @@
---
title: Frontend reference
type: guide
order: 905
meta_title: Frontend Library Reference
meta_description: Label Studio Frontend reference documentation for implementing the Label Studio Frontend into your own machine learning or data science application workflows.
---
Label Studio Frontend (LSF) includes a number of UI options and callbacks that you can use when implementing the frontend with a custom labeling backend, or when customizing the Label Studio interface.
## Updates to LSF in version 1.0.0
LSF version 1.0.0 is not compatible with earlier versions of Label Studio. If you use LSF with a custom backend, you must make changes to the API callbacks that you use as follows:
| Callback in 0.9.1 and earlier | Renamed callback in 1.0.0 |
| --- | --- |
| onSubmitCompletion | onSubmitAnnotation |
| onUpdateCompletion | onUpdateAnnotation |
| onDeleteCompletion | onDeleteAnnotation |
If you rely on specific formatting of Label Studio completed tasks, [Label Studio's annotation format](export.html#Raw-JSON-format-of-completed-tasks) has also been updated.
## Implement the Label Studio Frontend
```javascript
var labelStudio = new LabelStudio('editor', options);
```
The following options are recognized when initializing a Label Studio instance version 1.0.0.
## Options
### config
Default: `null`
Type data: `string`
XML configuration of task. List of formats to allow in the editor.
### interfaces
Default: `null`
Type data: `array`
Collection of UI elements to show:
```javascript
[
"annotations:add-new",
"annotations:delete",
"annotations:menu",
"controls",
"panel",
"predictions:menu",
"side-column",
"skip",
"submit"
"update",
]
```
- `annotations:add-new` - show add new annotations button
- `annotations:delete` - show delete current annotation button
- `annotations:menu` - show annotations menu
- `controls` - enable panel with controls (submit, update, skip)
- `panel` - navigation panel for current task with buttons: undo, redo and reset
- `predictions:menu` - show predictions menu
- `side-column` - enable panel with entities
- `skip` - show button to skip current task
- `submit` - show button to submit or update current annotation
- `update` - show button to update current task after submitting
### messages
Default: `null`
Type data: `object`
Messaging used for different actions
```javascript
{
DONE: "Done!",
NO_COMP_LEFT: "No more annotations",
NO_NEXT_TASK: "No more data available for labeling",
NO_ACCESS: "You don't have access to this task"
}
```
- `DONE` - Shown after the task is submitted to the server
- `NO_COMP_LEFT` - Shown if there are no more annotations
- `NO_NEXT_TASK` - No next task to load
- `NO_ACCESS` - Can't access the provided task
### description
Default: `No description`
Type data: `string`
Description of the current task.
### task
Task data
Default: `null`
Type data: `object`
```json
{
id: 1,
load: false
},
data: {
text: "Labeling text..."
},
annotations: [],
predictions: [],
}
```
#### id
Type data: `integer`
Default: `null`
#### data
#### annotations
Type data: `array`
Array of annotations. See the [annotation documentation](export.html#Raw-JSON-format-of-completed-tasks) for more information.
#### predictions
Type data: `array`
Array of predictions. Similar structure as completions or annotations. See the [annotation documentation](export.html#Raw-JSON-format-of-completed-tasks) and [guidance for importing predicted labels](predictions.html) for more information.
### user
User data
Type data: `object`
```json
{
pk: 1,
firstName: "Stanley",
lastName: "Kubrick"
}
```
#### pk
Type data: `number`
#### firstName
Type data: `string`
#### lastName
Type data: `string`
## Callbacks
Callbacks can be used to execute actions based on user interaction with the interface. For example, label-studio server uses callbacks to communicate with an API. Pass them along with other options when initiating the instance.
### onSubmitAnnotation
Type data: `function`
Called when the `submit` button is pressed. `ls` is label studio instance, `annotation` is the value of the current annotation.
#### Example
```javascript
onSubmitAnnotation: function(ls, annotation) {
console.log(annotation)
}
```
### onUpdateAnnotation
Type data: `function`
Called when the `update` button is pressed. `ls` is label studio instance, `annotation` is the value of the current annotation.
#### Example
```javascript
onUpdateAnnotation: function(ls, annotation) {
console.log(result)
}
```
### onDeleteAnnotation
Type data: `function`
Called when the `delete` button is pressed. `ls` is label studio instance, `annotation` is value of current annotation.
#### Example
```javascript
onDeleteAnnotation: function(ls, annotation) {
console.log(result)
}
```
### onEntityCreate
Type data: `function`
Called when a new region gets labeled, for example, a new bbox is created. `region` is the object that was created.
#### Example
```javascript
onEntityCreate: function(region) {
console.log(region)
}
```
### onEntityDelete
Type data: `function`
Called when an existing region gets deleted. `region` is the object itself.
#### Example
```javascript
onEntityDelete: function(region) {
console.log(region)
}
```
### onSkipTask
Type data: `function`
Called when the `skip` button is pressed. `ls` is label studio instance.
#### Example
```javascript
onSkipTask: function(ls) {
console.log(result)
}
```
### onLabelStudioLoad
Type data: `function`
Called when Label Studio has fully loaded and is ready for labeling. `ls` is the label studio instance
#### Example
```javascript
onLabelStudioLoad: function(ls) {
console.log(result)
}
```

View File

@ -0,0 +1,91 @@
---
title: Get started with Label Studio
short: Get started
type: guide
order: 100
meta_title: Get Started with Label Studio
meta_description: Get started with Label Studio by creating projects to label and annotate data for machine learning and data science models.
---
## What is Label Studio?
Label Studio is an open source data labeling tool for labeling and exploring multiple types of data. You can perform different types of labeling with many data formats.
You can also integrate Label Studio with machine learning models to supply predictions for labels (pre-labels), or perform continuous active learning. See [Set up machine learning with your labeling process](ml.html).
Label Studio is also available in Enterprise and Cloud editions with additional features. See [Label Studio features](label_studio_compare.html) for more.
## Quick start
1. Install Label Studio:
```bash
pip install label-studio
```
2. Start Label Studio
```bash
label-studio start
```
3. Open the Label Studio UI at http://localhost:8080.
4. Sign up with an email address and password that you create.
5. Click **Create** to create a project and start labeling data.
6. Name the project, and if you want, type a description and select a color.
7. Click **Data Import** and upload the data files that you want to use. If you want to use data from a local directory, cloud storage bucket, or database, skip this step for now.
8. Click **Labeling Setup** and choose a template and customize the label names for your use case.
9. Click **Save** to save your project.
You're ready to start [labeling and annotating your data](labeling.html)!
## Labeling workflow with Label Studio
All the steps required to start and finish a labeling project with Label Studio:
1. [Install Label Studio](install.html).
2. [Start Label Studio](start.html).
2. [Create accounts for Label Studio](signup.html). Create an account to manage and set up labeling projects.
3. <i class='ent'></i> [Restrict access to the project](manage_users.html). Set up role-based access control. Only available in Label Studio Enterprise Edition.
4. [Set up the labeling project](setup_project.html). Define the type of labeling to perform on the dataset and configure project settings.
5. [Set up the labeling interface](setup.html). Add the labels that you want annotators to apply and customize the labeling interface.
6. [Import data as labeling tasks](tasks.html).
7. [Label and annotate the data](labeling.html).
8. <i class='ent'></i> [Review the annotated tasks](quality.html). Only available in Label Studio Enterprise Edition.
9. [Export the labeled data or the annotations](export.html).
## Label Studio terminology
When you upload data to Label Studio, each item in the dataset becomes a labeling task. The following table describes some terms you might encounter as you use Label Studio.
| Term | Description |
| --- | --- |
| Dataset | What you import into Label Studio, comprised of individual items. |
| Task | What Label Studio transforms your individual dataset items into. |
| Labels | What you add to each dataset item while performing a labeling task in Label Studio. |
| Region | The portion of the dataset item that has a label assigned to it. |
| Relation | A defined relationship between two labeled regions. |
| Result | A label applied to a specific region. |
| Pre-labeling | What machine learning models perform in Label Studio or separate from Label Studio. The result of predicting labels for items in a dataset are predicted labels, or pre-annotations. |
| Annotations | The output of a labeling task. Previously called "completions". |
| Templates | Example labeling configurations that you can use to specify the type of labeling that you're performing with your dataset. See [all available templates](/templates) |
| Tags | Configuration options to customize the labeling interface. See [more about tags](/tags). |
## Components and architecture
You can use any of the Label Studio components in your own tools, or customize them to suit your needs. Before customizing Label Studio extensively, you might want to review Label Studio Enterprise Edition to see if it already contains the relevant functionality you want to build. See [Label Studio Features](label_studio_compare.html) for more.
The component parts of Label Studio are available as modular extensible packages that you can integrate into your existing machine learning processes and tools.
| Module | Technology | Description |
| --- | --- | --- |
| [Label Studio Backend](https://github.com/heartexlabs/label-studio/) | Python and [Django](https://www.djangoproject.com/) | Use to perform data labeling. |
| [Label Studio Frontend](https://github.com/heartexlabs/label-studio-frontend) | JavaScript web app using [React](https://reactjs.org/) and [MST](https://github.com/mobxjs/mobx-state-tree) | Perform data labeling in a user interface. |
| [Data Manager](https://github.com/heartexlabs/dm2) | JavaScript web app using [React](https://reactjs.org/) | Manage data and tasks for labeling. |
| [Machine Learning Backends](https://github.com/heartexlabs/label-studio-ml-backend) | Python | Predict data labels at various parts of the labeling process. |
<br>
<div style="margin:auto; text-align:center;"><img src="/images/ls-modules-scheme.png" style="opacity: 0.8"/></div>
<!--update to include data manager-->
## Information collected by Label Studio
Label Studio collects anonymous usage statistics about the number of page visits and data types being used in labeling configurations that you set up. No sensitive information is included in the information we collect. The information we collect helps us improve the experience of labeling data in Label Studio and helps us plan future data types and labeling configurations to support.

View File

@ -0,0 +1,162 @@
---
title: Install and upgrade Label Studio
type: guide
order: 200
meta_title: Install and Upgrade
meta_description: Label Studio documentation for installing and upgrading Label Studio with Docker, pip, and anaconda to use for your machine learning and data science projects.
---
Install Label Studio on premises or in the cloud. Choose the installation method that works best for your environment:
- [Install with pip](#Install-with-pip)
- [Install with Docker](#Install-with-Docker)
- [Install on Ubuntu](#Install-on-Ubuntu)
- [Install from source](#Install-from-source)
- [Install with Anaconda](#Install-with-Anaconda)
- [Install for local development](#Install-for-local-development)
- [Upgrade Label Studio](#Upgrade-Label-Studio)
<!-- md deploy.md -->
### Web browser support
Label Studio is tested with the latest version of Google Chrome and is expected to work in the latest versions of:
- Google Chrome
- Apple Safari
- Mozilla Firefox
If using other web browsers, or older versions of supported web browsers, unexpected behavior could occur.
## Install prerequisite
Install Label Studio in a clean Python environment. We highly recommend using a virtual environment (venv or conda) to reduce the likelihood of package conflicts or missing packages.
## Install with pip
To install Label Studio with pip and a virtual environment, you need Python version 3.6 or later. Run the following:
```bash
python3 -m venv env
source env/bin/activate
python -m pip install label-studio
```
To install Label Studio with pip, you need Python version 3.6 or later. Run the following:
```bash
pip install label-studio
```
After you install Label Studio, start the server with the following command:
```bash
label-studio
```
The default web browser opens automatically at [http://localhost:8080](http://localhost:8080) with Label Studio. See [start Label Studio](start.html) for more options when starting Label Studio.
## Install with Docker
Label Studio is also available as a Docker container. Make sure you have [Docker](https://www.docker.com/) installed on your machine.
### Install with Docker on *nix
To install and start Label Studio at [http://localhost:8080](http://localhost:8080), storing all labeling data in `./my_project` directory, run the following:
```bash
docker run -it -p 8080:8080 -v `pwd`/mydata:/label-studio/data heartexlabs/label-studio:latest
```
### Install with Docker on Windows
Or for Windows, you have to modify the volumes paths set by `-v` option.
#### Override the default Docker install
You can override the default Docker install by appending new arguments:
```bash
docker run -it -p 8080:8080 -v `pwd`/mydata:/label-studio/data heartexlabs/label-studio:latest label-studio --log-level DEBUG
```
### Build a local image with Docker
If you want to build a local image, run:
```bash
docker build -t heartexlabs/label-studio:latest .
```
### Run with Docker Compose
Use Docker Compose to serve Label Studio at `http://localhost:8080`.
Start Label Studio:
```bash
docker-compose up -d
```
This starts Label Studio with a PostgreSQL database backend. You can also use a PostgreSQL database without Docker Compose. See [Set up database storage](storedata.html).
## Install on Ubuntu
To install Label Studio on Ubuntu and run it in a virtual environment, run the following command:
```bash
python3 -m venv env
source env/bin/activate
sudo apt install python3.9-dev
python -m pip install label-studio
```
## Install from source
If you want to use nightly builds or extend the functionality, consider downloading the source code using Git and running Label Studio locally:
```bash
git clone https://github.com/heartexlabs/label-studio.git
cd label-studio
# Install all package dependencies
pip install -e .
# Run database migrations
python label_studio/manage.py migrate
# Start the server in development mode at http://localhost:8080
python label_studio/manage.py runserver
```
## Install with Anaconda
```bash
conda create --name label-studio python=3.8
conda activate label-studio
pip install label-studio
```
## Troubleshoot installation
You might see errors when installing Label Studio. Follow these steps to resolve them.
### Run the latest version of Label Studio
Many bugs might be fixed in patch releases or maintenance releases. Make sure you're running the latest version of Label Studio by upgrading your installation before you start Label Studio.
### Errors about missing packages
If you see errors about missing packages, install those packages and try to install Label Studio again. Make sure that you run Label Studio in a clean Python environment, such as a virtual environment.
For Windows users the default installation might fail to build the `lxml` package. Consider manually installing it from [the unofficial Windows binaries](https://www.lfd.uci.edu/~gohlke/pythonlibs/#lxml). If you are running Windows 64-bit with Python 3.8 or later, run `pip install lxml4.5.0cp38cp38win_amd64.whl` to install it.
### Errors from Label Studio
If you see any other errors during installation, try to rerun the installation.
```bash
pip install --ignore-installed label-studio
```
## Upgrade Label Studio
To upgrade to the latest version of Label Studio, reinstall or upgrade using pip.
```bash
pip install --upgrade label-studio
```
Migration scripts run when you upgrade to version 1.0.0 from version 0.9.1 or earlier.
To make sure an existing project gets migrated, when you [start Label Studio](start.html), run the following command:
```bash
label-studio start path/to/old/project
```
The most important change to be aware of is changes to rename "completions" to "annotations". See the [updated JSON format for completed tasks](export.html#Raw_JSON_format_of_completed_tasks).
If you customized the Label Studio Frontend, see the [Frontend reference guide](frontend_reference.html) for required updates to maintain compatibility with version 1.0.0.

View File

@ -0,0 +1,275 @@
---
title: Install Label Studio Enterprise on-premises using Docker
badge: <i class='ent'/></i>
type: guide
order: 201
meta_title: Install Label Studio Enterprise on-premises using Docker
meta_description: Install, back up, and upgrade Label Studio Enterprise with Docker to create machine learning and data science projects on-premises.
---
> Beta documentation: Label Studio Enterprise v2.0.0 is currently in Beta. As a result, this documentation might not reflect the current functionality of the product.
Install Label Studio Enterprise on-premises if you need to meet strong privacy regulations, legal requirements, or want to manage a custom installation on your own infrastructure using Docker or public cloud. To deploy Label Studio Enterprise on Amazon AWS in a Virtual Private Cloud (VPC), see [Install Label Studio Enterprise on AWS Private Cloud](install_enterprise_vpc.html).
You can run Label Studio Enterprise in an airgapped environment, and no data leaves your infrastructure. See [Secure Label Studio](security.html) for more details about security and hardening for Label Studio Enterprise.
<div class="enterprise"><p>
To install Label Studio Community Edition, see <a href="install.html">Install and Upgrade Label Studio</a>. This page is specific to the Enterprise version of Label Studio.
</p></div>
<!-- md deploy.md -->
## Install Label Studio Enterprise using Docker
1. Pull the latest image
2. Add the license file
3. Start using Docker. To start the server in development mode, start using Docker Compose.
### Prerequisites
Make sure you have an authorization token to retrieve Docker images and a current license file. If you are a Label Studio Enterprise customer and do not have access, [contact us](mailto:hello@heartex.ai) to receive an authorization token and a copy of your license file.
### Pull the latest image
You must be authorized to use Label Studio Enterprise images.
1. Set up the Docker login to retrieve the latest Docker image:
```bash
docker login --username heartexlabs
```
When prompted to enter the password, enter the token. If login succeeds, a `~/.docker/config.json` file is created with the authorization settings.
> If you have default registries specified when logging into Docker, you might need to explicitly specify the registry: `docker login --username heartexlabs docker.io`.
2. Pull the latest Label Studio Enterprise image:
```bash
docker pull heartexlabs/heartex:latest
```
> Note: You might need to use `sudo` to log in or pull images.
### Add the license file
After you retrieve the latest Label Studio Enterprise image, add the license file. You can't start the Docker image without a license file.
1. Create a working directory called `heartex` and place the license file in it.
```bash
mkdir -p heartex
cd heartex
```
2. Move the license file, `license.txt`, to the `heartex` directory.
### Start using Docker
To run Label Studio Enterprise in production, start it using Docker. This configuration allows you to link Label Studio with external databases and services.
1. Create a file, `heartex/env.list` with the required environmental variables:
```
# The main server URL (must be a full path like protocol://host:port)
HEARTEX_HOSTNAME=http://localhost:8080
# Auxiliary hostname URL: some platform functionality requires URIs generation with specified hostname,
# in case HEARTEX_HOSTNAME is not accessible from server side, use this variable to specify server host
HEARTEX_INTERNAL_HOSTNAME=
# PostgreSQL database name
POSTGRE_NAME=postgres
# PostgreSQL database user
POSTGRE_USER=postgres
# PostgreSQL database password
POSTGRE_PASSWORD=
# PostgreSQL database host
POSTGRE_HOST=db
# PostgreSQL database port
POSTGRE_PORT=5432
# PostgreSQL SSL mode
POSTGRE_SSL_MODE=require
# Specify Postgre SSL certificate
POSTGRE_SSLROOTCERT=postgre-ca-bundle.pem
# Redis location e.g. redis://[:password]@localhost:6379/1
REDIS_LOCATION=localhost:6379
# Redis database
REDIS_DB=1
# Redis password
REDIS_PASSWORD=12345
# Redis socket timeout
REDIS_SOCKET_TIMEOUT=3600
# Use Redis SSL connection
REDIS_SSL=1
# Require certificate
REDIS_SSL_CERTS_REQS=required
# Specify Redis SSL certificate
REDIS_SSL_CA_CERTS=redis-ca-bundle.pem
```
2. After you set all the environment variables, run Docker exposing port 8080:
```bash
docker run -d \
-p 8080:8080 \
--env-file env.list \
-v `pwd`/license.txt:/heartex/web/htx/settings/license_docker.txt \
-v `pwd`/logs:/var/log/heartex \
-v `pwd`/postgre-ca-bundle.pem:/etc/ssl/certs/postgre-ca-bundle.pem \
-v `pwd`/redis-ca-bundle.pem:/etc/ssl/certs/redis-ca-bundle.pem \
--name heartex \
heartexlabs/heartex:latest
```
> Note: If you expose port 80, you must start Docker with `sudo`.
### Start using Docker Compose
To run Label Studio Enterprise in development mode, start Label Studio using Docker Compose and local PostgreSQL and Redis servers to store data and configurations.
> Follow these instructions only if you plan to use Label Studio Enterprise in development mode. Otherwise, see [Start Using Docker](#Start-using-Docker) on this page.
#### Prerequisites
Make sure [Docker Compose](https://docs.docker.com/compose/install/) is installed on your system.
#### Start Label Studio Enterprise in development mode
1. Create a configuration file `heartex/config.yml` with the following content:
```yaml
version: '3'
services:
db:
image: postgres:11.5
hostname: db
restart: always
environment:
- POSTGRES_HOST_AUTH_METHOD=trust
volumes:
- ./postgres-data:/var/lib/postgresql/data
- ./logs:/var/log/heartex
ports:
- 5432:5432
heartex:
image: heartexlabs/heartex:latest
container_name: heartex
volumes:
- ./license.txt:/heartex/web/htx/settings/license_docker.txt
environment:
- HEARTEX_HOSTNAME=http://localhost:8080
- POSTGRE_NAME=postgres
- POSTGRE_USER=postgres
- POSTGRE_PASSWORD=
- POSTGRE_PORT=5432
- POSTGRE_HOST=db
- REDIS_LOCATION=redis:6379
command: ["./deploy/wait-for-postgres.sh", "db", "supervisord"]
ports:
- 8080:8080
depends_on:
- redis
links:
- db
- redis
redis:
image: redis:5.0.6-alpine
hostname: redis
volumes:
- "./redis-data:/data"
ports:
- 6379:6379
```
If you have existing services running on ports 5432, 6379, or 8080, update the `config.yml` file to use different ports.
2. Start all servers using docker-compose:
```bash
docker-compose -f config.yml up
```
3. Open [http://localhost:8080](http://localhost:8080) in a browser and start using Label Studio Enterprise in development mode.
#### Data persistence
When the Label Studio Enterprise server runs with docker-compose, all essential data is stored inside the container. The following local file storage directories are linked to the container volumes to make sure data persists:
- `./postgres-data` contains PostgreSQL database
- `./redis-data` contains Redis dumps
The integrity of these folders ensures that your data is not lost even if you completely stop and remove all running containers and images. The `./postgres-data` files are specific to the PostgreSQL version. The current supported PostgreSQL version is 11.5.
## Update Label Studio Enterprise
1. [Back up your existing container](#Back-up-Label-Studio-Enterprise).
2. Pull the latest image
3. Update the container
### Get the Docker image version
To check the version of the Label Studio Enterprise Docker image, run [`docker ps`](https://docs.docker.com/engine/reference/commandline/ps/) on the host.
Run the following command as root or using `sudo` and review the output:
```bash
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
b1dd57a685fb heartexlabs/heartex:latest "./deploy/start.sh" 36 minutes ago Up 36 minutes 0.0.0.0:8080->8000/tcp heartex
```
The image column displays the Docker image and version number. The image `heartexlabs/heartex:latest` is using the version `latest`.
### Back up Label Studio Enterprise
Back up your Label Studio Enterprise Docker container before you upgrade your version and for disaster recovery purposes.
1. From the command line, run Docker stop to stop the currently running container with Label Studio Enterprise:
```bash
docker stop heartex
```
2. Rename the existing container to avoid name conflicts when updating to the latest version:
```bash
docker rename heartex heartex-backup
```
You can then treat the `heartex-backup` image as a backup.
### Pull a new image
After backing up your existing container, pull the latest image of Label Studio Enterprise from the Docker registry.
```bash
docker pull heartexlabs/heartex:latest
```
### Update the container
After you pull the latest image, update your Label Studio Enterprise container:
```bash
docker run -d \
-p $EXPOSE_PORT:8080 \
-v `pwd`/license.txt:/heartex/web/htx/settings/license_docker.txt \
-v `pwd`/logs:/var/log/heartex \
-v `pwd`/postgre-ca-bundle.pem:/etc/ssl/certs/postgre-ca-bundle.pem \
-v `pwd`/redis-ca-bundle.pem:/etc/ssl/certs/redis-ca-bundle.pem \
--name heartex \
heartexlabs/heartex:latest
```
### Restore from a backed up container
If you decide to roll back to the previously backed up version of Label Studio Enterprise, stop and remove the new container and replace it with the backup.
1. From the command line, stop the latest running container and remove it:
```bash
docker stop heartex && docker rm heartex
```
2. Rename the backup container:
```bash
docker rename heartex-backup heartex
```
3. Start the backup container:
```bash
docker start heartex
```

View File

@ -0,0 +1,186 @@
---
title: Install Label Studio Enterprise on AWS Private Cloud
badge: <i class='ent'></i>
type: guide
order: 202
meta_title: Install Label Studio Enterprise on AWS Private Cloud
meta_description: Install and upgrade Label Studio Enterprise on AWS VPC to create machine learning and data science projects.
---
> Beta documentation: Label Studio Enterprise v2.0.0 is currently in Beta. As a result, this documentation might not reflect the current functionality of the product.
If you want to manage your own Label Studio Enterprise installation in the cloud, follow these steps to install it in AWS Virtual Private Cloud (VPC). You can also [install Label Studio Enterprise on-premises using Docker](install_enterprise.html) if you need to meet strong privacy regulations, legal requirements, or want to manage a custom installation on your own infrastructure.
<div class="enterprise"><p>
To install Label Studio Community Edition, see <a href="install.html">Install and Upgrade Label Studio</a>. This page is specific to the Enterprise version of Label Studio.
</p></div>
<!-- md deploy.md -->
## Install Label Studio Enterprise on AWS Private Cloud
You can deploy to your own private cloud with all necessary components provided by Amazon AWS services. The bundle comes with the following configuration for the Amazon services:
- Virtual Private Cloud (VPC)
- Identity and Access Management (IAM)
- Route53
- Elastic Container Registry (ECR)
- Elastic Container Service (ECS)
- Simple Storage Service (S3)
- ElastiCache
- Relational Database Service (RDS)
- Systems Manager Agent (SSM)
- CodeDeploy
Deployment scripts are distributed as [terraform](https://www.terraform.io/) configuration files (.tf).
### Prerequisites
Download and install the Terraform package for your operating system and architecture. Recommended terraform version: v0.12.18 or higher
You must have a root user with all the required permissions for deploying AWS components. If you do not have an active AWS profile with full administrative access, see [Create a root user](#Create-a-root-user) on this page.
#### Create a root user
1. In the `user/` directory of your VPC, review and if necessary, modify `user.tf` parameters to match the following:
```hcl
locals {
user_name = "heartex-production"
policy_name = "heartex-production"
bucket_name = "heartex-terraform-state" // S3 bucket name used to store terraform state
}
```
2. Initialize and run Terraform inside that directory:
```bash
terraform init
terraform apply
```
If Terraform completes successfully, you see the following output:
```haml
Apply complete! Resources: 6 added, 0 changed, 0 destroyed.
Outputs:
iam_access_key_id = AKIAAWSACCESSKEYID
iam_access_key_secret = 9FaIL7Eza8mAwSsEcReTAcCeSsKeY
```
3. Store the user credentials in your local environment in a specific AWS named profile. For example, append the following credentials to the `~/.aws/credentials` file:
```text
[heartex-production]
aws_access_key_id = AKIAAWSACCESSKEYID
aws_secret_access_key = 9FaIL7Eza8mAwSsEcReTAcCeSsKeY
```
4. After storing the user credentials, use them for your Label Studio instance by updating the `AWS_PROFILE` environment variable to reference these new credentials:
`export AWS_PROFILE=heartex-production`
For more about configuring AWS in your local environment, see [Configuring the AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html) in the Amazon AWS documentation.
### Configure your VPC
Configure your Amazon VPC to work with Label Studio Enterprise.
> You must know your AWS account ID to perform these configurations. If you do not know it, you can retrieve it from the AWS STS API: `aws sts get-caller-identity`
1. In the `envs/production` directory, open the `main.tf` file.
2. In the `locals` section for your private cloud settings, make updates to match the following example. Update placeholders such as the `public_dns_namespace` or `account_id` to match the values relevant for your environment.
```hcl
locals {
stack_name = "heartex-production"
aws_region = "us-west-1"
aws_profile = "heartex-production"
public_dns_namespace = "my.heartex.com"
account_id = "490012345678" // your AWS account ID
images = ["webapp"]
image_webapp = "${local.account_id}.dkr.ecr.${local.aws_region}.amazonaws.com/${local.aws_profile}/webapp:latest"
vpc_cidr_block = "10.0.0.0/16"
cidrsubnet_newbits = 8
bucket_name = "heartex-bucket" // S3 storage for internal use
node_type = "cache.t2.micro"
allocated_storage = 5
user_name = "postgres" // DB user name
storage_type = "gp2" // Storage type: Standard, «gp2» (general-purpose SSD) or «io1» (SSD IOPS)
instance_class = "db.t2.micro" // DB instance type (https://aws.amazon.com/ru/rds/instance-types/)
licence_file_path = "${path.root}/license.txt"
// Additional parameters
redis_ssl = 0
redis_ssl_cert_reqs = "required"
// ...
```
### Deploy Label Studio Enterprise
1. Add a license to your VPC. Place the license file in the same directory as the `main.tf` file. For example, `envs/production/license.txt`.
2. Initialize Terraform so that you can use it to deploy the relevant modules. From the command line, run the following:
```bash
terraform init
```
3. Create an IAM role to use for deploying and updating images. From the command line, run the following:
```bash
terraform apply -target module.iam
```
Store the output credentials in a secure location. You can use them to make Lambda invocations to deploy updates to your VPC.
4. Create an elastic container registry (ECR) to store the Label Studio Enterprise images. From the command line, run the following:
```bash
terraform apply -target module.ecr
```
You see the following output in your console:
```bash
ecr_repository_urls = {
"webapp" = "490012345678.dkr.ecr.us-west-1.amazonaws.com/heartex-production/webapp"
}
iam_access_key_maintainer_id = AKIAXEGRLHACCESSKEYID
iam_access_key_maintainer_secret = mrKlneaqzXVcFKlTSECrEtAcCeSsKeY
```
You use the `ecr_repository-urls` to make updates and push new Label Studio Enterprise versions to your VPC.
5. Upload images to the ECR. Make sure you have the latest Label Studio Enterprise Docker image. From the command line, run the following:
```bash
docker tag <image name:version> 490012345678.dkr.ecr.us-west-1.amazonaws.com/heartex-production/webapp
docker push 490012345678.dkr.ecr.us-west-1.amazonaws.com/heartex-production/webapp
```
6. Configure a managed DNS service for your VPC with a public URL specified by the `public_dns_namespace` in the `main.tf` file. From the command line, run the following:
```bash
terraform apply -target module.route53
```
The output contains name servers records that you can use to [create a hosted zone in DNS](https://console.aws.amazon.com/route53/v2/hostedzones#CreateHostedZone) for your VPC.
```bash
route53_zone_name_servers = [
"ns-1234.awsdns-12.org",
"ns-5678.awsdns-12.co.uk",
"ns-123.awsdns-34.com",
"ns-456.awsdns-34.net"
]
```
DNS might take some time to update with the latest records.
7. Deploy the remaining modules. From the command line, run the following:
```bash
terraform apply
```
It takes about ten minutes for the modules to finish deploying.
If you see an error, `Error describing created certificate: Expected certificate to be issued but was in state PENDING_VALIDATION`, try finishing the deployment later. This error can be caused by invalid DNS servers, which can take some time to be updated with the newest records.
## Update Label Studio Enterprise on AWS Private Cloud
To update Label Studio Enterprise on AWS Private Cloud, update your installation to the latest image in the ECR using blue/green deployment.
From the command line, run the following:
```bash
aws --region <selected-region> lambda invoke --function-name <chosen-stack-name>-deploy --payload '{"service": "webapp", "image": "<image-name-with-proper-version-tag>"}' result.json && cat result.json| jq .
```
### Remove a private cloud instance
Destroy all created services by running [terraform destroy](https://www.terraform.io/docs/cli/commands/destroy.html):
```bash
terraform destroy
```

View File

@ -0,0 +1,185 @@
---
title: Label Studio features
type: guide
order: 108
meta_title: Label Studio Community and Enterprise Features
meta_description: Compare the features of Label Studio Community Edition with the paid Label Studio Enterprise Edition so that you can choose the best option for your data labeling and annotation projects.
---
> Beta documentation: Label Studio Enterprise v2.0.0 is currently in Beta. As a result, this documentation might not reflect the current functionality of the product.
Label Studio is available as a Community Edition open source data labeling tool and as a paid version with extended functionality and support as an Enterprise Edition. [Contact us to get started with Label Studio Enterprise Edition](https://heartex.com/)!
<table>
<tr>
<th>Functionality</th>
<th>Community Edition</th>
<th>Enterprise Edition</th>
</tr>
<tr>
<td colspan="3"><b>User Management</b></td>
</tr>
<tr>
<td>User accounts to associate labeling activities to a specific user.</td>
<td style="text-align:center">✔️</td>
<td style="text-align:center">✔️</td>
</tr>
<tr>
<td><a href="manage_users.html">Role-based access control for each user account.</a></td>
<td style="text-align:center"></td>
<td style="text-align:center">✔️</td>
</tr>
<tr>
<td><a href="manage_users.html">Organizations and workspaces to manage users and projects.</a></td>
<td style="text-align:center"></td>
<td style="text-align:center">✔️</td>
</tr>
<tr>
<td colspan="3"><b>Project Management</b></td>
</tr>
<tr>
<td><a href="setup_project.html">Projects to manage data labeling activities.</a></td>
<td style="text-align:center">✔️</td>
<td style="text-align:center">✔️</td>
</tr>
<tr>
<td><a href="setup.html">Templates to get started with specific data labeling tasks faster.</a></td>
<td style="text-align:center">✔️</td>
<td style="text-align:center">✔️</td>
</tr>
<tr>
<td colspan="3"><b>Data Management</b></td>
</tr>
<tr>
<td>Manage your data in a user interface.</td>
<td style="text-align:center">✔️</td>
<td style="text-align:center">✔️</td>
</tr>
<tr>
<td><a href="tasks.html">Import data from many sources.</a></td>
<td style="text-align:center">✔️</td>
<td style="text-align:center">✔️</td>
</tr>
<tr>
<td><a href="export.html">Export data into many formats.</a></td>
<td style="text-align:center">✔️</td>
<td style="text-align:center">✔️</td>
</tr>
<tr>
<td><a href="storage.html">Synchronize data from and to remote data storage.</a></td>
<td style="text-align:center">✔️</td>
<td style="text-align:center">✔️</td>
</tr>
<tr>
<td colspan="3"><b>Data Labeling Workflows</b></td>
</tr>
<tr>
<td>Assign specific annotators to specific tasks.</td>
<td style="text-align:center"></td>
<td style="text-align:center">✔️</td>
</tr>
<tr>
<td><a href="setup_project.html">Automatic queue management.</a></td>
<td style="text-align:center"></td>
<td style="text-align:center">✔️</td>
</tr>
<tr>
<td>Label text, images, audio data, HTML, and time series data.</td>
<td style="text-align:center">✔️</td>
<td style="text-align:center">✔️</td>
</tr>
<tr>
<td>Label mixed types of data.</td>
<td style="text-align:center">✔️</td>
<td style="text-align:center">✔️</td>
</tr>
<tr>
<td>Synchronize data from and to remote data storage.</td>
<td style="text-align:center">✔️</td>
<td style="text-align:center">✔️</td>
</tr>
<tr>
<td>Annotator-specific view.</td>
<td style="text-align:center"></td>
<td style="text-align:center">✔️</td>
</tr>
<tr>
<td colspan="3"><b>Annotator Performance</b></td>
</tr>
<tr>
<td>Control label quality by monitoring annotator agreement.</td>
<td style="text-align:center"></td>
<td style="text-align:center">✔️</td>
</tr>
<tr>
<td>Manage and review annotator performance.</td>
<td style="text-align:center"></td>
<td style="text-align:center">✔️</td>
</tr>
<tr>
<td><a href="quality.html">Verify model and annotator accuracy against ground truth annotations.</a></td>
<td style="text-align:center"></td>
<td style="text-align:center">✔️</td>
</tr>
<tr>
<td>Verify annotation results.</td>
<td style="text-align:center"></td>
<td style="text-align:center">✔️</td>
</tr>
<tr>
<td>Assign reviewers to review annotation results.</td>
<td style="text-align:center"></td>
<td style="text-align:center">✔️</td>
</tr>
<tr>
<td colspan="3"><b>Machine Learning</b></td>
</tr>
<tr>
<td><a href="ml_create.html">Connect machine learning models to Label Studio with an SDK.</a></td>
<td style="text-align:center">✔️</td>
<td style="text-align:center">✔️</td>
</tr>
<tr>
<td>Accelerate labeling with active learning.</td>
<td style="text-align:center">✔️</td>
<td style="text-align:center">✔️</td>
</tr>
<tr>
<td><a href="ml.html">Automatically label dataset items with ML models.</a></td>
<td style="text-align:center">✔️</td>
<td style="text-align:center">✔️</td>
</tr>
<tr>
<td colspan="3"><b>Analytics and Reporting</b></td>
</tr>
<tr>
<td>Reporting and analytics on labeling and annotation activity.</td>
<td style="text-align:center"></td>
<td style="text-align:center">✔️</td>
</tr>
<tr>
<td>Activity log to use to audit annotator activity.</td>
<td style="text-align:center"></td>
<td style="text-align:center">✔️</td>
</tr>
<tr>
<td colspan="3"><b>Advanced Functionality</b></td>
</tr>
<tr>
<td><a href="api.html">API access to manage Label Studio.</a></td>
<td style="text-align:center">✔️</td>
<td style="text-align:center">✔️</td>
</tr>
<tr>
<td>On-premises hosting of Label Studio.</td>
<td style="text-align:center"></td>
<td style="text-align:center">✔️</td>
</tr>
<tr>
<td><a href="auth_setup.html">Support for single sign-on using LDAP or SAML. </a></td>
<td style="text-align:center"></td>
<td style="text-align:center">✔️</td>
</tr>
</table>

View File

@ -0,0 +1,216 @@
---
title: Label and annotate data
type: guide
order: 402
meta_title: Label and annotate data
meta_description: Label and annotate data using the Label Studio UI to create bounding boxes, label text spans, set up relations, and filter and sort project data for your machine learning dataset creation.
---
Label and annotate your data with the open source data labeling tool, Label Studio. After you [set up your project](setup_project.html) and [labeling interface](setup.html) and [import your data](tasks.html), you can start labeling and annotating your data.
1. Open a project in Label Studio and optionally [filter or sort the data](#Filter-or-sort-project-data).
2. Click **Label** to [start labeling](#Start-labeling).
3. Use [keyboard shortcuts](#Use-keyboard-shortcuts-to-label-regions-faster) or your mouse to label the data and submit your annotations.
4. Follow the project instructions for labeling and deciding whether to skip tasks.
5. Click the project name to return to the data manager.
## Filter or sort project data
When you filter or sort the data before you label it, you modify which tasks and the order of the tasks you see when labeling. While [task sampling](start.html#Set_up_task_sampling_for_your_project) affects the task order for an entire project and can't be changed, filtering and sorting tasks can be changed at any time.
### Example: Label new data first
Sort the data in your project by date to focus on labeling the newest data first.
1. In a project, update the **Order** of the data from the default to **Created at**.
2. Update the order of the items to be in ascending order, so the newest items appear first.
3. Click **Label** to start labeling tasks from newest to oldest.
### Example: Sort by prediction score
You can sort the data in your project by prediction score if you upload [pre-annotated data](predictions.html) with prediction scores, or if your [machine learning backend](ml.html) produces prediction scores as part of the model output.
1. In a project, update the **Order** of the data from the default to use the **Prediction score** field.
2. Update the order of the items in either ascending or descending order to label based on higher confidence or lower confidence predictions.
3. Click **Label** to start labeling tasks in prediction score order.
You can also use [task sampling](start.html#Set_up_task_sampling_for_your_project) to use prediction score ordering.
### Example: Split a dataset using tabs and filters
If you want to label a large dataset, you might want to use tabs and filters to split it up into smaller sections, and assign different annotators to different tabs. You can't assign annotators to specific tasks in Label Studio Community Edition, but you can rename the tabs after specific annotators as a way to basically assign tasks using tabs.
For example, you might split a dataset with 300 images into 3 different tabs, and have different annotators focus on each tab:
1. In a project, create a filter where the **ID** field **is between** the values "1" and "100". Click away from the filter to review filtered items the tab.
2. Click the vertical ellipsis for the tab and select **Rename**. Name it after a specific annotator that you want to focus on the items in that tab.
3. Click the **+** icon to create a new tab. Click the vertical ellipsis for the new tab and select **Rename** to name it after a second annotator.
4. On the new tab, create a filter where the **ID** field **is between** the values "101" and "200". Click away from the filter to review the filtered items on the tab.
5. Click the **+** icon to create a new tab. Click the vertical ellipsis for the new tab and select **Rename** to name it after a third annotator.
6. On the new tab, create a filter where the **ID** field **is between** the values "201" and "300". Click away from the filter to review the filtered items on the tab.
7. Any annotator can log in and navigate to the relevant tab for their work and click the **Label** button to start labeling the subset of tasks on their tab.
## Start labeling
From a project, click **Label** to start labeling. You can also label a specific task by clicking it when viewing the data in a project, but you won't automatically see the next task in the labeling queue after submitting your annotations.
Some labeling tasks can be complicated to perform, for example, labeling that includes text, image, and audio data objects as part of one dataset and labeling task, or creating relations between annotations on a labeling task.
### Label a region in the data
Annotate a section of the data by adding a region.
1. Select the label you want to apply to the region. For some configurations, you can skip this step.
2. Click the text, image, audio, or other data object to apply the label to the region. Your changes save automatically.
3. Click **Submit** to submit the completed annotation and move on to the next task.
### Label overlapping regions
When you label with bounding boxes and other image segmentation tasks, or when you're highlighting text for NLP and NER labeling, you might want to label overlapping regions. To do this easily, hide labeled regions after you annotate them.
1. Select the label that you want to apply to the region.
2. Draw the bounding box or highlight the text that you want to label.
3. In the **Regions** or **Labels** sidebar, locate and select the region that you labeled and click the eye icon to hide the region.
4. Select the next label that you want to apply to the overlapping region.
5. Draw the bounding box or highlight the text that you want to label.
6. Continue hiding and labeling regions until you've completed annotating the task. If you want, select the eye icon next to **Regions** to hide and then show all regions labeled on the task to confirm the end result.
7. Click **Submit** to submit the completed annotation and move on to the next task.
### Change the label
You can change the label of an existing region.
1. Select the labeled region, for example a span, bounding box, image segment, audio region, or other region.
2. Select a new label. Your changes to the label save automatically.
3. Click **Submit** to submit the completed annotation and move on to the next task.
### Delete an annotation
After labeling a region, you can delete the annotation.
1. Select the labeled region.
2. Press the Backspace key, or go to the **Results** panel and remove the selected annotation.
You can also delete all annotations on a task from the project page. See [Delete tasks or annotations](setup.html#Delete_tasks_or_annotations).
### Add relations between annotations
You can create relations between two results with both directions and labels. To add labels to directions, you must set up a labeling config with the relations tag. See more about [relations with labels](/tags/relations.html) in the Tags documentation.
1. Select the region for the annotation that you want to relate to another annotation. If you're creating a direction-based relation, select the first one first.
2. In the **Regions** section of the **Results** sidebar, click the **Create Relation** button that looks like a hyperlink icon.
3. Select the second region for the annotation to complete the relation.
<br>
<img src="../images/relation.png">
After you relate two annotation regions, you can modify the relation in the **Relations** section of the **Results** sidebar.
- To change the direction of the relation, click the direction button between the two related regions.
- To add labels to the direction arrow indicating the relation between two annotations, click the vertical ellipsis button next to the two related regions to add your predefined labels. You must have a [label configuration that includes relations](/tags/relations.html) to do this.
### Skipping a task
When annotators skip a task, the task no longer appears in the labeling queue for that annotator. Other annotators still see the task in their labeling queue.
## Label with collaborators
In both Label Studio and Label Studio Enterprise, you can label tasks with collaborators. Tasks are locked while someone performs annotations so that you don't accidentally overwrite the annotations of another annotator. After the other annotator finishes with the task, it can appear in your queue for labeling if the minimum annotations per task is set to more than one. By default, tasks only need to be annotated by one annotator.
<div class="enterprise"><p>
If you're using Label Studio Enterprise and want more than one annotator to annotate tasks, <a href="setup_project.html">update the project settings</a>. After you update the minimum annotations required per task, annotators can use the Label Stream workflow to label their tasks.
</p></div>
If you want to label tasks more than once, even if the minimum annotations required is set to one, do the following:
To label tasks multiple times while the minimum annotations required is set to one, do the following:
1. In the data manager for the project, click a task to open the quick labeling view.
2. Click the `+` icon next to the task annotation ID to open an annotation tab.
3. Label the task.
4. Click **Submit** to save your annotation.
5. Click the next task in the data manager to open the quick labeling view for that task and repeat steps 2-4.
## Use keyboard shortcuts to label regions faster
Use keyboard shortcuts (hotkeys) to improve your labeling performance. When performing a labeling task, click the gear icon to see more details about hotkeys or to enable or disable hotkeys.
This table describes the hotkeys for a standard keyboard. For a Mac keyboard, use return and delete instead of enter and backspace.
| Key | Description |
| --- | --- |
| ctrl+enter | Submit a task |
| ctrl+backspace | Delete all regions |
| escape | Exit relation mode |
| backspace | Delete selected region |
| alt+shift+$n | For $n region, select a region |
## Customize the labeling interface
Click the settings icon when labeling to configure the labeling interface to suit your labeling use case.
For example, keep a label selected after creating a region, display labels on bounding boxes, polygons and other regions while labeling, and show line numbers for text labeling.
<center>
<img src='../images/lsf-settings.png'>
</center>
You can also modify the layout of the screen, hide or show predictions, annotations, or the results panel, and hide or show various controls and buttons.
## Advanced image labeling
If you want to perform advanced image labeling, follow these examples and guidance for assistance.
### Add multiple types of regions to image annotations
You can add multiple types of regions to image annotations. You can add any of the following:
- Rectangles
- Ellipses
- Keypoints
- Polygons
- Brush masks
To add different types of regions to your image annotations, follow this example.
Create a custom template for your labeling interface using the following example:
```xml
<View>
<Image name="image" value="$image" />
<Rectangle name="rect" toName="image" />
<Ellipse name="ellipse" toName="image" />
<KeyPoint name="kp" toName="image" />
<Polygon name="polygon" toName="image" />
<Brush name="brush" toName="image" />
<Choices name="choices" toName="image">
<Choice value="yes"></Choice>
<Choice value="no"></Choice>
</Choices>
<Labels name="labels" toName="image" fillOpacity="0.5" strokeWidth="5">
<Label value="building" background="green"></Label>
<Label value="vehicle" background="blue"></Label>
</Labels>
</View>
```
This example makes rectangles, ellipses, polygons, keypoints, and brush masks available to the annotator, along with image classification choices of yes and no, and region labels of building and vehicle.
### Faster image labeling
You can add a rectangle or an ellipse to your image with just two clicks, or double click to create a polygon, rectangle, or ellipse.
If you accidentally select a point on an image while creating a polygon, just double click to remove the erroneous point and continue creating the region. There must be at least three points on the polygon to be able to remove a point.
### Create regions without labels
When you're annotating images, you can create regions without applying labels.
1. Create a region by double-clicking or clicking and dragging to create a bounding box, or click the points necessary to construct a polygon.
2. Select the created region in the sidebar or on the image.
3. Select the label that you want to apply to the region.
4. Repeat these steps for any regions that you want to create.
This can be helpful for two-step labeling, where you want one annotator to create regions and another annotator to label the regions.
By default, regions without labels appear gray.
### Erase brush mask labels
If you make a mistake when labeling with the brush mask, you can erase it. You must select a brush region in the sidebar before you can erase any part of it.
If you want to completely remove a region and start over, delete the region instead of erasing it. Erasing a region does not delete it.
<!-- md annotation_ids.md -->

View File

@ -0,0 +1,343 @@
---
title: Manage access to Label Studio
short: Manage access
badge: <i class='ent'></i>
type: guide
order: 251
meta_title: Manage Role-Based Access Control in Label Studio
meta_description: Manage access and set up permissions with user roles, organizations, and project workspaces for your data labeling, machine learning, and data science projects in Label Studio Enterprise.
---
> Beta documentation: Label Studio Enterprise v2.0.0 is currently in Beta. As a result, this documentation might not reflect the current functionality of the product.
Manage access to projects, organizations, and workspaces in Label Studio to restrict who can view data, annotations, and predictions in your data labeling projects.
<div class="enterprise"><p>
Role-based access control, organizations, and workspaces are available only in Label Studio Enterprise Edition. For information about users in the open source Label Studio Community Edition, see <a href="signup.html">Set up user accounts for Label Studio</a>.
</p></div>
## Roles in Label Studio Enterprise
There are five roles available in Label Studio Enterprise Edition. Organization members have different levels of access to projects and workspaces. Every member can label tasks.
| Role | Description |
| --- | --- |
| Owner | Not an assignable role. Manages Label Studio. Can create organizations, modify workspaces, create and modify projects, and view activity log. |
| Administrator | Manages an organization. Has full access to all projects. Can modify workspaces, view activity logs, and approve invitations. Cant see the workspace owners account page. |
| Manager | Manages projects. Can view any project and has full access to their own projects. |
| Reviewer | Reviews annotated tasks. Can view projects with tasks assigned to them. Can review and update task annotations. |
| Annotator | Labels tasks. Can view projects with tasks assigned to them and label tasks in those projects. |
## Roles and workspaces
Use a combination of roles, to control what actions users can take, and project workspaces, to control what data and projects users have access to.
For example, a project annotator using Label Studio sees only the projects they have access to:
<img src="/images/LSE/LSE-annotator-view.jpg" width=400 height=275 alt="Diagram showing that only Label Studio projects that they have been added to are visible to an annotator."/>
A Label Studio administrator sees all projects and workspaces that exist in the Label Studio instance:
<img src="/images/LSE/LSE-admin-view.jpg" width=600 height=400 alt="Diagram showing that an administrator can view all projects and workspaces in a Label Studio instance."/>
## Permissions in Label Studio Enterprise
<table>
<tr>
<th>Action</th>
<th>Annotator</th>
<th>Reviewer</th>
<th>Manager</th>
<th>Administrator</th>
<th>Owner</th>
</tr>
<tr>
<td colspan="6"><b>User Management</b></td>
</tr>
<tr>
<td>Change user roles</td>
<td></td>
<td></td>
<td></td>
<td style="text-align:center">✔️</td>
<td style="text-align:center">✔️</td>
</tr>
<tr>
<td>View People page</td>
<td></td>
<td></td>
<td style="text-align:center">✔️</td>
<td style="text-align:center">✔️</td>
<td style="text-align:center">✔️</td>
</tr>
<tr>
<td>Invite people to organization</td>
<td></td>
<td></td>
<td></td>
<td style="text-align:center">✔️</td>
<td style="text-align:center">✔️</td>
</tr>
<tr>
<td>Workspace access</td>
<td style="text-align:center">R</td>
<td style="text-align:center">R</td>
<td style="text-align:center">CRUD</td>
<td style="text-align:center">CRUD</td>
<td style="text-align:center">CRUD</td>
</tr>
<tr>
<td colspan="6"><b>Project Management</b></td>
</tr>
<tr>
<td>Project access</td>
<td style="text-align:center">R</td>
<td style="text-align:center">R</td>
<td style="text-align:center">CRUD</td>
<td style="text-align:center">CRUD</td>
<td style="text-align:center">CRUD</td>
</tr>
<tr>
<td>Save custom project templates</td>
<td></td>
<td></td>
<td style="text-align:center">✔️</td>
<td style="text-align:center">✔️</td>
<td style="text-align:center">✔️</td>
</tr>
<tr>
<td colspan="6"><b>Data Access</b></td>
</tr>
<tr>
<td>View project data</td>
<td>If permitted in project settings, can view own.</td>
<td style="text-align:center">✔️</td>
<td style="text-align:center">✔️</td>
<td style="text-align:center">✔️</td>
<td style="text-align:center">✔️</td>
</tr>
<tr>
<td>Import data</td>
<td></td>
<td></td>
<td style="text-align:center">✔️</td>
<td style="text-align:center">✔️</td>
<td style="text-align:center">✔️</td>
</tr>
<tr>
<td>Export data</td>
<td></td>
<td></td>
<td style="text-align:center">✔️</td>
<td style="text-align:center">✔️</td>
<td style="text-align:center">✔️</td>
</tr>
<tr>
<td colspan="6"><b>Data Labeling Workflows</b></td>
</tr>
<tr>
<td>Assign annotators to tasks</td>
<td></td>
<td style="text-align:center">✔️</td>
<td style="text-align:center">✔️</td>
<td style="text-align:center">✔️</td>
<td style="text-align:center">✔️</td>
</tr>
<tr>
<td>Access labeling workflow</td>
<td style="text-align:center">✔️</td>
<td style="text-align:center">✔️</td>
<td style="text-align:center">✔️</td>
<td style="text-align:center">✔️</td>
<td style="text-align:center">✔️</td>
</tr>
<tr>
<td>Access review workflow</td>
<td></td>
<td style="text-align:center">✔️</td>
<td style="text-align:center">✔️</td>
<td style="text-align:center">✔️</td>
<td style="text-align:center">✔️</td>
</tr>
<tr>
<td>Monitor annotator agreement</td>
<td></td>
<td style="text-align:center">✔️</td>
<td style="text-align:center">✔️</td>
<td style="text-align:center">✔️</td>
<td style="text-align:center">✔️</td>
</tr>
<tr>
<td>Review annotator performance</td>
<td>Own</td>
<td style="text-align:center">✔️</td>
<td style="text-align:center">✔️</td>
<td style="text-align:center">✔️</td>
<td style="text-align:center">✔️</td>
</tr>
<tr>
<td>Verify annotation results</td>
<td></td>
<td style="text-align:center">✔️</td>
<td style="text-align:center">✔️</td>
<td style="text-align:center">✔️</td>
<td style="text-align:center">✔️</td>
</tr>
<tr>
<td>Assign reviewers to tasks</td>
<td></td>
<td></td>
<td style="text-align:center">✔️</td>
<td style="text-align:center">✔️</td>
<td style="text-align:center">✔️</td>
</tr>
<tr>
<td colspan="6"><b>Advanced</b></td>
</tr>
<tr>
<td>API access to equivalent Label Studio functionality</td>
<td></td>
<td></td>
<td style="text-align:center">✔️ for own or workspace projects</td>
<td style="text-align:center">✔️</td>
<td style="text-align:center">✔️</td>
</tr>
<tr>
<td colspan="6"><b>Analytics</b></td>
</tr>
<tr>
<td>Track what happens and when on annotation dashboards</td>
<td>Own</td>
<td>Project</td>
<td style="text-align:center">Workspace and invited projects</td>
<td style="text-align:center">Organization</td>
<td style="text-align:center">Organization</td>
</tr>
<tr>
<td>View annotator dashboard</td>
<td style="text-align:center">✔️</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>View system-wide activity log</td>
<td></td>
<td></td>
<td></td>
<td style="text-align:center">✔️</td>
<td style="text-align:center">✔️</td>
</tr>
</table>
## Set up role-based access control (RBAC) with Label Studio
Set up role-based access control in Label Studio by using [organizations and workspaces to organize projects](#Use-organizations-to-manage-data-and-projects) and assigning roles to organization members. Use roles to control what actions organization members can perform in Label Studio, and manage organization and workspace membership to manage what data and projects those people can access.
Only people with the Administrator and Owner roles can invite people to Label Studio and manage their role membership.
### Invite users to Label Studio Enterprise
Invite users to your organization by doing the following:
1. In the Label Studio UI, click the hamburger icon to expand the left-hand menu and click **Organization**.
2. On the Organization page, click **+ Add People**.
3. In the dialog box that appears, click **Copy Link** and share the invitation link to your Label Studio instance with the people that you want to join your organization.
### Assign roles to invited users
After a user that you invite clicks the link and signs up for an account, their account exists but must be activated by an organization owner or administrator. When you activate someone's account, you also assign them a role in Label Studio.
To activate a user account and assign a role, do the following:
1. In the Label Studio UI, click the hamburger icon to expand the left-hand menu and click **Organization**.
2. Locate the user with a status of **Not Activated**.
3. Select the drop-down under **Role** and select the relevant role for the user.
Your changes save automatically. Repeat these steps for any additional users.
### Programmatically assign roles
To programmatically activate and assign roles to users, you can use the following API endpoints.
#### Assign a role to a user
For a given user ID and a given organization ID, POST a request to the `/api/organizations/{id}/memberships` endpoint with the following body:
```json
{
user_id: Int,
role: NO|DI|OW|AD|MA|AN|RE
}
```
Enumerate a role with one of the following abbreviations:
| Role | Full Role Name |
| --- | --- |
| NO | Not Activated |
| DI | Deactivated |
| OW | Owner |
| AD | Administrator |
| MA | Manager |
| AN | Annotator |
| RE | Reviewer |
For example, to set a user with an ID of 9 as an annotator, POST the following request body:
```json
{
"user_id": 9,
"role": "AN"
}
```
#### Determine the organization ID or user ID
If you're not sure what the organization ID is, you can do the following:
- If you only have one organization in your Label Studio instance, use `0`.
- If you have multiple organizations, make a GET request to the `/api/organizations/` endpoint.
To retrieve user IDs for the members of an organization, make a GET request to `/api/organizations/{id}/memberships`.
## Use organizations to manage data and projects
To manage organization membership, use the **Organization** page in the Label Studio UI. When you sign up for Label Studio Enterprise for the first time, an organization associated with your account is automatically created. You become the owner of that organization. People who join Label Studio Enterprise from an invitation link or with an LDAP or SSO role join an existing organization.
If permitted by your Label Studio Enterprise plan, you can create organizations in Label Studio to further separate access to data and projects. For example, you could create separate organizations to separate work and access between completely unrelated departments. If some departments might collaborate with each other on a project, you can use one organization for both and instead use workspaces to organize the projects that they might or might not be collaborating on.
For example, you might set up one of the following possible configurations:
- One organization for your company, with one workspace for the support department and another for the development team, with specific projects in each workspace for different types of customer requests.
<img src="/images/LSE/LSE-one-org-many-workspaces.jpg" alt="Diagram showing Label Studio with one organization with multiple workspaces and projects within each workspace."/>
- Multiple organizations, such as one for the customer claims department and another for the customer support department, with specific workspaces in each organization for specific types of insurance, such as home insurance claims and auto insurance claims, and specific projects in each workspace for types of claims, such as Accident Claims, Injury Claims, Natural Disaster Claims. The Customer support organization might have workspaces specific to the types of support queues, with projects for specific types of calls received.
<img src="/images/LSE/LSE-multiple-orgs-workspaces.jpg" alt="Diagram showing Label Studio with three organizations, each one with multiple workspaces and projects within each workspace."/>
When you [assign a user role](manage_users.html) to an organization member, they hold that role for all workspaces and projects for that organization.
Managers within an organization can see all workspaces in that organization, even if they don't have access to perform actions in them. Annotators and reviewers can only see projects, not workspaces.
If you have access to multiple organizations, use the **Organizations** page to switch between the organizations that you are a member of.
## Create workspaces to organize projects
Within an organization, owners, administrators, and managers can create and manage workspaces. Workspace managers can only manage workspaces that they create or have been added to.
Create a workspace to organize projects by doing the following:
1. In the Label Studio UI, click the `+` sign next to **Workspaces** in the menu.
2. Name the workspace, and if you want, select a color.
3. Click **Save**
After creating a workspace, you can create projects for that workspace, or use the **Project Settings** page to move a project to the new workspace. Your private sandbox also functions as a workspace, but only you can see projects in your sandbox.
### Add or remove members to a workspace
From a specific workspace inside the Label Studio UI, do the following:
1. Click **Manage Members**.
2. Use the search functionality to locate the user that you want to add to the workspace.
3. Select the checkbox next to their name and click the `>` arrow so that they appear in the list of users that **Belong to the Workspace**.
4. Click **Save**.
You can also remove yourself or other members from a workspace by following the same process and removing members with the `<` arrow.
### Sandbox workspace
Each user has a personal Sandbox workspace that they can use to experiment with project settings and get familiar with Label Studio. After you set up a project and want others to collaborate on it with you, you can update the project workspace in the **Project Settings**. You cannot add members to your Sandbox workspace.
### Delete a workspace
You can only delete a workspace if it has no projects. If you want to delete a workspace, first delete the projects or move them to another workspace.
To delete a workspace, do the following:
1. In the Label Studio UI, open the workspace.
2. Click the gear icon next to the workspace name.
3. In the dialog box that appears, click **Delete Workspace**. If the button is not available to select, the workspace still contains projects.

179
docs/source/guide/ml.md Normal file
View File

@ -0,0 +1,179 @@
---
title: Set up machine learning
short: Machine learning setup
type: guide
order: 606
meta_title: Set up machine learning with Label Studio
meta_description: Connect Label Studio to machine learning frameworks using the Label Studio ML backend SDK to integrate your model development pipeline seamlessly with your data labeling workflow.
---
Set up machine learning with your labeling process by setting up a machine learning backend for Label Studio.
With Label Studio, you can set up your favorite machine learning models to do the following:
- **Pre-labeling** by letting models predict labels and then perform further manual refinements.
- **Auto-labeling** by letting models create automatic annotations.
- **Online Learning** by simultaneously updating your model while new annotations are created, letting you retrain your model on-the-fly.
- **Active Learning** by selecting example tasks that the model is uncertain how to label for your annotators to label.
With these capabilities, you can use Label Studio as part of a production-ready **Prediction Service**.
## What is the Label Studio ML backend?
The Label Studio ML backend is an SDK that you can use to wrap your machine learning code and turn it into a web server. You can then connect that server to a Label Studio instance to perform 2 tasks:
- Dynamically pre-annotate data based on model inference results
- Retrain or fine-tune a model based on recently annotated data
For example, for an image classification task, the model pre-selects an image class for data annotators to verify. For audio transcriptions, the model displays a transcription that data annotators can modify.
The overall steps of setting up a Label Studio ML backend are as follows:
1. Get your model code.
2. Wrap it with the [Label Studio SDK](ml_create.html).
3. Create a running server script
4. Launch the script
5. Connect Label Studio to ML backend on the UI
Follow the [Quickstart](#Quickstart) for an example. For assistance with steps 1-3, see how to [create your own machine learning backend](ml_create.html).
If you need to load static pre-annotated data into Label Studio, running an ML backend might be more than you need. Instead, you can [import pre-annotated data](predictions.html).
## Quickstart
Get started with a machine learning (ML) backend with Label Studio. You need to start both the machine learning backend and Label Studio to start labeling. You can review examples in the [`label-studio-ml/examples` section of the Label Studio ML backend repository](https://github.com/heartexlabs/label-studio-ml-backend/tree/master/label_studio_ml/examples).
Follow these steps to set up an example text classifier ML backend with Label Studio:
1. Clone the Label Studio Machine Learning Backend git repository.
```bash
git clone https://github.com/heartexlabs/label-studio-ml-backend
```
2. Set up the environment.
It is highly recommended to use `venv`, `virtualenv` or `conda` python environments. You can use the same environment as Label Studio. [Read more in the Python documentation](https://docs.python.org/3/tutorial/venv.html#creating-virtual-environments) about creating virtual environments via `venv`.
```bash
cd label-studio-ml-backend
# Install label-studio-ml and its dependencies
pip install -U -e .
# Install example dependencies
pip install -r label_studio_ml/examples/requirements.txt
```
3. Initialize an ML backend based on an example script:
```bash
label-studio-ml init my_ml_backend \
--script label_studio_ml/examples/simple_text_classifier.py
```
This ML backend is an example provided by Label Studio. See [how to create your own ML backend](ml_create.html).
3. Start the ML backend server.
```bash
label-studio-ml start my_ml_backend
```
4. Start Label Studio. Run the following:
```bash
label-studio start
```
5. Create a project and import text data. Set up the labeling interface to use the **Text Classification** template.
6. In the **Machine Learning** section of the project settings page, add the link `http://localhost:9090` to your machine learning model backend.
<br>
<center><img src="/images/ml-backend-card.png"></center>
If you run into any issues, see [Troubleshoot machine learning](ml_troubleshooting.html)
## Train a model
After you connect a model to Label Studio as a machine learning backend, you can start training the model:
- Manually using the Label Studio UI, click the **Start Training** button on the **Machine Learning** settings for your project.
- Automatically after any annotations are submitted or updated, enable the option `Start model training after annotations submit or update` on the **Machine Learning** settings for your project.
- Manually using the API, cURL the API from the command line, specifying the ID of your project:
```
curl -X POST http://localhost:8080/api/ml/{id}/train
```
You must have at least one task annotated before you can start training.
In development mode, training logs appear in the web browser console. In production mode, you can find runtime logs in `my_backend/logs/uwsgi.log` and RQ training logs in `my_backend/logs/rq.log` on the server running the ML backend, which might be different from the Label Studio server. To see more detailed logs, start the ML backend server with the `--debug` option.
## Get predictions from a model
After you connect a model to Label Studio as a machine learning backend, you can see model predictions in the labeling interface if the model is pre-trained, or right after it finishes training.
If the model has not been trained yet, do the following to get predictions to appear:
1. Start labeling data in Label Studio.
2. Return to the **Machine Learning** settings for your project and click **Start Training** to start training the model.
3. In the data manager for your project, select the tasks that you want to get predictions for and select **Retrieve predictions** using the drop-down actions menu. Label Studio sends the selected tasks to your ML backend.
4. After retrieving the predictions, they appear in the task preview and Label stream modes for the selected tasks.
You can also retrieve predictions automatically by loading tasks. To do this, enable `Retrieve predictions when loading a task automatically` on the **Machine Learning** settings for your project. When you scroll through tasks in the data manager for a project, the predictions for those tasks are automatically retrieved from the ML backend. Predictions also appear when labeling tasks in the Label stream workflow.
> Note: For a large dataset, the HTTP request to retrieve predictions might be interrupted by a timeout. If you want to **get all predictions** for all tasks in a dataset, the recommended way is to make a [POST call to the predictions endpoint of the Label Studio API](https://api.labelstud.io/#operation/api_predictions_create) on the ML backend side for each generated prediction.
If you want to retrieve predictions manually for a list of tasks **using only an ML backend**, make a GET request to the `/predict` URL of your ML backend with a payload of the tasks that you want to see predictions for, formatted like the following example:
```json
{
"tasks": [
{"data": {"text":"some text"}}
]
}
```
## Delete predictions
If you want to delete all predictions from Label Studio, you can do it using the UI or the API:
- For a specific project, select the tasks that you want to delete predictions for and select **Delete predictions** from the drop-down menu.
- Using the API, run the following from the command line to delete the predictions for a specific project ID:
```
curl -H 'Authorization: Token <user-token-from-account-page>' -X POST \
"http://localhost:8080/api/dm/actions?id=delete_tasks_predictions&project=<id>"
```
## Set up a machine learning backend with Docker Compose
Label Studio includes everything you need to set up a production-ready ML backend server powered by Docker.
The Label Studio machine learning server uses [uWSGI](https://uwsgi-docs.readthedocs.io/en/latest/) and [supervisord](http://supervisord.org/) and handles background training jobs with [RQ](https://python-rq.org/).
### Prerequisites
Perform these prerequisites to make sure your server starts successfully.
1. Specify all requirements in a `my-ml-backend/requirements.txt` file. For example, to specify scikit-learn as a requirement for your model, do the following:
```requirements.txt
scikit-learn
```
2. Make sure ports 9090 and 6379 are available and do not have services running on them. To use different ports, update the default ports in `my-ml-backend/docker-compose.yml`, created after you start the machine learning backend.
### Start with Docker Compose
1. Start the machine learning backend with an example model or your [custom machine learning backend](mlbackend.html).
```bash
label-studio-ml init my-ml-backend --script label_studio-ml/examples/simple_text_classifier.py
```
You see configurations in the `my-ml-backend/` directory that you need to build and run a Docker image using Docker Compose.
2. From the `my-ml-backend/` directory, start Docker Compose.
```bash
docker-compose up
```
The machine learning backend server starts listening on port 9090.
3. Connect the machine learning backend to Label Studio on the **Machine Learning** settings for your project in Label Studio UI.
If you run into any issues, see [Troubleshoot machine learning](ml_troubleshooting.html)
## Active Learning
The process of creating annotated training data for supervised machine learning models is often expensive and time-consuming. Active Learning is a branch of machine learning that seeks to **minimize the total amount of data required for labeling by strategically sampling observations** that provide new insight into the problem. In particular, Active Learning algorithms aim to select diverse and informative data for annotation, rather than random observations, from a pool of unlabeled data using **prediction scores**. For more theory read [our article on Towards data science](https://towardsdatascience.com/learn-faster-with-smarter-data-labeling-15d0272614c4).
You can select a task ordering like `Predictions score` on Data manager and the sampling strategy will fit the active learning scenario. Label Studio will send a train signal to ML Backend automatically on the each annotation submit/update. You can enable these train signals on the **machine learning** settings page for your project.
* If you need to retrieve and save predictions for all tasks, check recommendations from a [topic below](ml.html#Get-predictions-from-a-model).
* If you want to delete all predictions after your model is retrained, check [this topic](ml.html#Delete-predictions).
<br>
<img src="/images/ml-backend-active-learning.png" style="border:1px #eee solid">

View File

@ -0,0 +1,90 @@
---
title: Write your own ML backend
type: guide
order: 607
meta_title: Machine Learning SDK
meta_description: Set up your machine learning model to output and consume predictions in your data science and data labeling projects.
---
Set up a machine learning model as a backend to Label Studio so that you can dynamically output and consume predictions as labeling occurs. You can follow this tutorial to wrap custom machine learning model code with the Label Studio ML SDK, or refer to [example ML backend tutorials](ml_tutorials.html) to integrate with popular machine learning frameworks such as PyTorch, GPT2, and others.
## Prerequisites
Before you start integrating your custom model code with the Label Studio ML SDK to use it as an ML backend with Label Studio, determine the following:
1. The expected inputs and outputs for your model. In other words, the type of labeling that your model supports in Label Studio, which informs the [Label Studio labeling config](setup.html#Set-up-the-labeling-interface-for-your-project). For example, text classification labels of "Dog", "Cat", or "Opossum" could be possible inputs and outputs.
2. The [prediction format](predictions.html) returned by your ML backend server.
3. The required packages and dependencies necessary to run your machine learning model.
## Create a machine learning backend
This example tutorial outlines how to wrap a simple text classifier based on the [scikit-learn](https://scikit-learn.org/) framework with the Label Studio ML SDK.
Start by creating a class declaration. You can create a Label Studio-compatible ML backend server in one command by inheriting it from `LabelStudioMLBase`.
```python
from label_studio_ml.model import LabelStudioMLBase
class MyModel(LabelStudioMLBase):
```
Then, define loaders & initializers in the `__init__` method.
```python
def __init__(self, **kwargs):
# don't forget to initialize base class...
super(MyModel, self).__init__(**kwargs)
self.model = self.load_my_model()
```
There are special variables provided by the inherited class:
- `self.parsed_label_config` is a Python dict that provides a Label Studio project config structure. See [ref for details](). Use might want to use this to align your model input/output with Label Studio labeling configuration;
- `self.label_config` is a raw labeling config string;
- `self.train_output` is a Python dict with the results of the previous model training runs (the output of the `fit()` method described bellow) Use this if you want to load the model for the next updates for active learning and model fine-tuning.
After you define the loaders, you can define two methods for your model: an inference call and a training call.
## Inference call
Use an inference call to get pre-annotations from your model on-the-fly. You must update the existing predict method in the example ML backend scripts to make them work for your specific use case.
Write your own code to override the `predict(tasks, **kwargs)` method, which takes [JSON-formatted Label Studio tasks](tasks.html#Basic-Label-Studio-JSON-format) and returns predictions in the [format accepted by Label Studio](predictions.html).
**Example**
This example defines an inference call that pulls the labeling configuration schema and then outputs the predictions from your model in that format so that Label Studio can understand and display the predictions in the Label Studio UI. This example uses a labeling configuration that uses the [`Choices` tag](/tags/choices.html).
```python
def predict(self, tasks, **kwargs):
predictions = []
# Get annotation tag first, and extract from_name/to_name keys from the labeling config to make predictions
from_name, schema = list(self.parsed_label_config.items())[0]
to_name = schema['to_name'][0]
for task in tasks:
# for each task, return classification results in the form of "choices" pre-annotations
predictions.append({
'result': [{
'from_name': from_name,
'to_name': to_name,
'type': 'choices',
'value': {'choices': ['My Label']}
}],
# optionally you can include prediction scores that you can use to sort the tasks and do active learning
'score': 0.987
})
return predictions
```
## Training call
Use the training call to update your model with new annotations. You don't need to use this call in your code, for example if you just want to pre-annotate tasks without retraining the model. If you do want to retrain the model based on annotations from Label Studio, use this method.
Write your own code to override the `fit(completions, **kwargs)` method, which takes [JSON-formatted Label Studio annotations](https://labelstud.io/guide/export.html#Raw-JSON-format-of-completed-labeled-tasks) and returns an arbitrary dict where some information about the created model can be stored.
> Note: The `completions` field is deprecated as of Label Studio 1.0.x and will be replaced with `annotations` in a future release of this SDK.
**Example**
```python
def fit(self, completions, workdir=None, **kwargs):
# ... do some heavy computations, get your model and store checkpoints and resources
return {'checkpoints': 'my/model/checkpoints'} # <-- you can retrieve this dict as self.train_output in the subsequent calls
```
After you wrap your model code with the class, define the loaders, and define the methods, you're ready to run your model as an ML backend with Label Studio. See the [Quickstart](ml.html#Quickstart).

View File

@ -0,0 +1,57 @@
---
title: Troubleshoot machine learning
type: guide
order: 609
meta_title: Troubleshoot Machine Learning
meta_description: Troubleshoot Label Studio connections with machine learning frameworks using the Label Studio ML backend SDK.
---
After you [set up machine learning with Label Studio] or [create your own machine learning backend] to use with Label Studio, you can troubleshoot any issues you encounter by reviewing the possible causes on this page.
## Troubleshoot by reviewing the ML server logs
You can investigate most problems using the server console log. The machine learning backend runs as a separate server from Label Studio, so make sure you check the correct server console logs while troubleshooting. To see more detailed logs, start the ML backend server with the `--debug` option.
If you're running an ML backend:
- Production training logs are located in `my_backend/logs/rq.log`
- Production runtime logs are located in `my_backend/logs/uwsgi.log`
In development mode, training logs appear in the web browser console.
If you're running an ML backend using Docker Compose:
- Training logs are located in `logs/rq.log`
- Main process and inference logs are located in `logs/uwsgi.log`
## I launched the ML backend, but it appears as **Disconnected** after adding it in the Label Studio UI
Your ML backend server might not have started properly.
1. Check whether the ML backend server is running. Run the following health check:<br/> `curl -X GET http://localhost:9090/health`
2. If the health check doesn't respond, or you see errors, check the server logs.
3. If you used Docker Compose to start the ML backend, check for requirements missing from the `requirements.txt` file used to set up the environment inside Docker.
## The ML backend seems to be connected, but after I click "Start Training", I see "Error. Click here for details." message
Click the error message to review the traceback. Common errors that you might see include:
- Insufficient number of annotations completed for training to begin.
- Memory issues on the server.
If you can't resolve the traceback issues by yourself, <a href="http://slack.labelstud.io.s3-website-us-east-1.amazonaws.com?source=docs-ML">contact us on Slack</a>.
## My predictions are wrong or I don't see the model prediction results on the labeling page
Your ML backend might be producing predictions in the wrong format.
- Check to see whether the ML backend predictions format follows the same structure as [predictions in imported pre-annotations](predictions.html).
- Confirm that your project's label configuration matches the output produced by your ML backend. For example, use the Choices tag to create a class of predictions for text. See more [Label Studio tags](/tags.html).
## The model backend fails to start or run properly
If you see errors about missing packages in the terminal after starting your ML backend server, or in the logs, you might need to specify additional packages in the `requirements.txt` file for your ML backend.
## ML backend is unable to access tasks
Because the ML backend and Label Studio are different services, the assets (images, audio, etc.) that you label must be hosted and be accessible with URLs by the machine learning backend, otherwise it might fail to create predictions.
## I get a validation error when adding the ML backend
If you get a validation error when adding the ML backend URL to your Label Studio project, check the following:
- Is the labeling interface set up with a valid configuration?
- Is the machine learning backend running? Run the following health check:<br/> `curl -X GET http://localhost:9090/health`
- Is your machine learning backend available from your Label Studio instance? It must be available to the instance running Label Studio.
If you're running Label Studio in Docker, you must run the machine learning backend inside the same Docker container, or otherwise make it available to the Docker container running Label Studio. You can use the `docker exec` command to run commands inside the Docker container, or use `docker exec -it <container_id> /bin/sh` to start a shell in the context of the container. See the [docker exec documentation](https://docs.docker.com/engine/reference/commandline/exec/).

View File

@ -0,0 +1,108 @@
---
title: ML Examples and Tutorials
type: guide
order: 608
meta_title: Machine Learning Example Tutorials
meta_description: Label Studio tutorial documentation for setting up a machine learning model with predictions using PyTorch, GPT2, Sci-kit learn, and other popular frameworks in your data science and data labeling projects.
---
<link rel="stylesheet" href="/tutorials/styles.css">
<style>
h1 {
margin-top: 0 !important;
margin-left: 1em !important;
}
@media screen and (min-width: 900px) {
.content {
margin-top: 1em;
margin-left: 290px !important;
}
}
</style>
<div class="blog-body">
<div class="grid">
<!-- Simple -->
<div class="column">
<a href="/tutorials/dummy_model.html">
<div class="card">
<div class="image-wrap">
<div style="background-image: url(/tutorials/simple-image-classification.png); background-size:cover" class="image"></div>
</div>
<div class="category">image classification, starter</div>
<div class="desc"></div>
<div class="title">Create a simple ML backend</div>
</div>
</a>
</div>
<!-- Text classification -->
<div class="column">
<a href="/tutorials/sklearn-text-classifier.html">
<div class="card">
<div class="image-wrap">
<div style="background-image: url(/tutorials/text-classification.png); background-size:cover" class="image"></div>
</div>
<div class="category">text classification</div>
<div class="title">Text classification with Scikit-Learn</div>
</div>
</a>
</div>
<!-- Image Object Detector -->
<div class="column">
<a href="/tutorials/object-detector.html">
<div class="card">
<div class="image-wrap">
<div style="background-image: url(/tutorials/object-detection-with-bounding-boxes.png); background-size:cover" class="image"></div>
</div>
<div class="category">object detection, image</div>
<div class="title">OpenMMLab Image object detector</div>
</div>
</a>
</div>
<!-- Transfer learning for images with PyTorch -->
<div class="column">
<a href="/tutorials/pytorch-image-transfer-learning.html">
<div class="card">
<div class="image-wrap">
<div style="background-image: url(/tutorials/image-classification-pytorch.png); background-size:cover" class="image"></div>
</div>
<div class="category">image classification</div>
<div class="title">Transfer learning for images with PyTorch</div>
</div>
</a>
</div>
<!-- Chatbot response generation with HuggingFace's GPT2 model -->
<div class="column">
<a href="/tutorials/gpt.html">
<div class="card">
<div class="image-wrap">
<div style="background-image: url(/tutorials/gpt2.png)" class="image"></div>
</div>
<div class="category">Text generation</div>
<div class="title">Chatbot response generation with HuggingFace's GPT2 model</div>
</div>
</a>
</div>
<!-- Automatic Speech Recognition with Nvidia's NeMo -->
<div class="column">
<a href="/tutorials/nemo_asr.html">
<div class="card">
<div class="image-wrap">
<div class="image" style="background-image: url(/tutorials/nemo.png)"></div>
</div>
<div class="category">Audio ASR</div>
<div class="title">Automatic Speech Recognition with Nvidia's NeMo</div>
</div>
</a>
</div>
</div>

View File

@ -0,0 +1,618 @@
---
title: Import pre-annotated data into Label Studio
short: Import pre-annotations
type: guide
order: 301
meta_title: Import pre-annotated data into Label Studio
meta_description: Import predicted labels, predictions, pre-annotations, or pre-labels into Label Studio for your data labeling, machine learning, and data science projects.
---
If you have predictions generated for your dataset from a model, either as pre-annotated tasks or pre-labeled tasks, you can import the predictions with your dataset into Label Studio for review and correction. Label Studio automatically displays the pre-annotations that you import on the Labeling page for each task.
To import predicted labels into Label Studio, you must use the [Basic Label Studio JSON format](tasks.html#Basic-Label-Studio-JSON-format) and set up your tasks with the `predictions` JSON key. The Label Studio ML backend also outputs tasks in this format.
For image pre-annotations, Label Studio expects the x, y, width, and height of image annotations to be provided in percentages of overall image dimension. See [Units for image annotations](predictions.html#Units_for_image_annotations) on this page for more about how to convert formats.
Import pre-annotated tasks into Label Studio [using the UI](tasks.html#Import-data-from-the-Label-Studio-UI) or [using the API](/api#operation/projects_import_create).
## Import pre-annotations for images
For example, import predicted labels for tasks to determine whether an item in an image is an airplane or a car.
Use the following labeling configuration:
```xml
<View>
<Choices name="choice" toName="image" showInLine="true">
<Choice value="Boeing" background="blue"/>
<Choice value="Airbus" background="green" />
</Choices>
<RectangleLabels name="label" toName="image">
<Label value="Airplane" background="green"/>
<Label value="Car" background="blue"/>
</RectangleLabels>
<Image name="image" value="$image"/>
</View>
```
After you set up an example project, create example tasks that match the following format.
<br/>
{% details <b>Click to expand the example image JSON</b> %}
Save this example JSON as a file to import it into Label Studio, for example, `example_prediction_task.json`.
{% codeblock lang:json %}
[{
"data": {
"image": "http://localhost:8080/static/samples/sample.jpg"
},
"predictions": [{
"result": [
{
"id": "result1",
"type": "rectanglelabels",
"from_name": "label", "to_name": "image",
"original_width": 600, "original_height": 403,
"image_rotation": 0,
"value": {
"rotation": 0,
"x": 4.98, "y": 12.82,
"width": 32.52, "height": 44.91,
"rectanglelabels": ["Airplane"]
}
},
{
"id": "result2",
"type": "rectanglelabels",
"from_name": "label", "to_name": "image",
"original_width": 600, "original_height": 403,
"image_rotation": 0,
"value": {
"rotation": 0,
"x": 75.47, "y": 82.33,
"width": 5.74, "height": 7.40,
"rectanglelabels": ["Car"]
}
},
{
"id": "result3",
"type": "choices",
"from_name": "choice", "to_name": "image",
"value": {
"choices": ["Airbus"]
}
}],
"score": 0.95
}]
}]
{% endcodeblock %}
In this example there are 3 results inside 1 prediction, or pre-annotation:
- `result1` - the first bounding box
- `result2` - the second bounding box
- `result3` - choice selection
The prediction score applies to the entire prediction.
{% enddetails %}
<br/>
Import pre-annotated tasks into Label Studio [using the UI](tasks.html#Import-data-from-the-Label-Studio-UI) or [using the API](/api#operation/projects_import_create).
In the Label Studio UI, the imported prediction for this task looks like the following:
<center><img src="../images/predictions_loaded.png" alt="screenshot of the Label Studio UI showing an image of airplanes with bounding boxes covering each airplane." style="width: 100%; max-width: 700px"></center>
## Import pre-annotated regions for images
If you want to import images with pre-annotated regions without labels assigned to them, follow this example.
Use the following labeling configuration:
```xml
<View>
<View style="display:flex;align-items:start;gap:8px;flex-direction:row">
<Image name="image" value="$image" zoom="true" zoomControl="true" rotateControl="false"/>
<Rectangle name="rect" toName="image" showInline="false"/>
</View>
<Ellipse name="ellipse" toName="image"/>
<KeyPoint name="kp" toName="image"/>
<Polygon name="polygon" toName="image"/>
<Brush name="brush" toName="image"/>
<Labels name="labels" toName="image" fillOpacity="0.5" strokeWidth="5">
<Label value="Vehicle" background="green"/>
<Label value="Building" background="blue"/>
<Label value="Pavement" background="red"/>
</Labels>
</View>
```
After you set up an example project, create example tasks that match the following format.
<br/>
{% details <b>Click to expand the example image region JSON</b> %}
Save this example JSON as a file to import it into Label Studio, for example, `example_prediction_task.json`.
{% codeblock lang:json %}
[{
"id":8,
"predictions":[
{
"id":10,
"result":[
{
"original_width":800,
"original_height":450,
"image_rotation":0,
"value":{
"x":55.46666666666667,
"y":2.3696682464454977,
"width":35.86666666666667,
"height":46.91943127962085,
"rotation":0
},
"id":"ABC",
"from_name":"rect",
"to_name":"image",
"type":"rectangle"
},
{
"original_width":800,
"original_height":450,
"image_rotation":0,
"value":{
"x":58.4,
"y":64.21800947867298,
"width":30.533333333333335,
"height":19.90521327014218,
"rotation":0
},
"id":"DEF",
"from_name":"rect",
"to_name":"image",
"type":"rectangle"
},
{
"original_width":800,
"original_height":450,
"image_rotation":0,
"value":{
"points":[
[
20.933333333333334,
28.90995260663507
],
[
25.866666666666667,
64.69194312796209
],
[
38.4,
62.796208530805686
],
[
34.13333333333333,
27.488151658767773
]
]
},
"id":"GHI",
"from_name":"polygon",
"to_name":"image",
"type":"polygon"
},
{
"original_width":800,
"original_height":450,
"image_rotation":0,
"value":{
"x":8.4,
"y":20.14218009478673,
"radiusX":4,
"radiusY":7.109004739336493,
"rotation":0
},
"id":"JKL",
"from_name":"ellipse",
"to_name":"image",
"type":"ellipse"
}
],
"task":8
}
],
"data":{
"image":"/data/upload/31159626248_d0362d027c_c.jpg"
},
"project":4
}]
{% endcodeblock %}
In this example there are 3 regions inside 1 result field for a prediction, or pre-annotation:
- region `ABC` - a rectangle bounding box
- region `DEF` - a second rectangle bounding box
- region `GHI` - a polygonal segmentation
- region `JKL` - an ellipse
None of the regions have labels applied. The labeling configuration must use the `Rectangle` tag instead of the `RectangleLabels` tag to support this type of prediction. Even though the labeling configuration for this example has a `Labels` tag, the predictions do not need to specify labels for the regions.
{% enddetails %}
<br/>
<!-- md image_units.md -->
## Import pre-annotations for text
In this example, import pre-annotations for text using the [named entity recognition template](/templates/named_entity.html):
```xml
<View>
<Labels name="label" toName="text">
<Label value="Person"></Label>
<Label value="Organization"></Label>
<Label value="Fact"></Label>
<Label value="Money"></Label>
<Label value="Date"></Label>
<Label value="Time"></Label>
<Label value="Ordinal"></Label>
<Label value="Percent"></Label>
<Label value="Product"></Label>
<Label value="Language"></Label>
<Label value="Location"></Label>
</Labels>
<Text name="text" value="$text"></Text>
</View>
```
### Example JSON
This example JSON file contains two tasks, each with two sets of pre-annotations from different models. The first task also contains prediction scores for each NER span.
<br/>
{% details <b>Click to expand the example NER JSON</b> %}
Save this example JSON as a file, for example: `example_preannotated_ner_tasks.json`.
{% codeblock lang:json %}
[
{
"data": {
"text": "All that changed when he was 27 and he came to Jerusalem. It was the weekend of both Easter and Passover, and the city was flooded with tourists."
},
"predictions": [
{
"model_version": "one",
"result": [
{
"from_name": "label",
"to_name": "text",
"type": "labels",
"value": {
"start": 29,
"end": 31,
"score": 0.70,
"text": "27",
"labels": [
"Date"
]
}
},
{
"from_name": "label",
"to_name": "text",
"type": "labels",
"value": {
"start": 47,
"end": 56,
"score": 0.65,
"text": "Jerusalem",
"labels": [
"Location"
]
}
},
{
"from_name": "label",
"to_name": "text",
"type": "labels",
"value": {
"start": 65,
"end": 76,
"score": 0.95,
"text": "the weekend",
"labels": [
"Date"
]
}
},
{
"from_name": "label",
"to_name": "text",
"type": "labels",
"value": {
"start": 85,
"end": 91,
"score": 0.50,
"text": "Easter",
"labels": [
"Date"
]
}
}
]
},
{
"model_version": "two",
"result": [
{
"from_name": "label",
"to_name": "text",
"type": "labels",
"value": {
"start": 29,
"end": 31,
"score": 0.55,
"text": "27",
"labels": [
"Date"
]
}
},
{
"from_name": "label",
"to_name": "text",
"type": "labels",
"value": {
"start": 47,
"end": 56,
"score": 0.40,
"text": "Jerusalem",
"labels": [
"Location"
]
}
},
{
"from_name": "label",
"to_name": "text",
"type": "labels",
"value": {
"start": 65,
"end": 76,
"score": 0.32,
"text": "the weekend",
"labels": [
"Time"
]
}
},
{
"from_name": "label",
"to_name": "text",
"type": "labels",
"value": {
"start": 85,
"end": 91,
"score": 0.22,
"text": "Easter",
"labels": [
"Location"
]
}
},
{
"from_name": "label",
"to_name": "text",
"type": "labels",
"value": {
"start": 96,
"end": 104,
"score": 0.96,
"text": "Passover",
"labels": [
"Date"
]
}
}
]
}
]
},
{
"data": {
"text": " Each journal was several inches thick and bound in leather. On one page are drawn portraits of Sunny in a flowery, Easter dress and sun hat. On another page are hundreds of sketches of leaves that Niyati saw in her yard."
},
"predictions": [
{
"model_version": "one",
"result": [
{
"from_name": "label",
"to_name": "text",
"type": "labels",
"value": {
"start": 17,
"end": 31,
"text": "several inches",
"labels": [
"Product"
]
}
},
{
"from_name": "label",
"to_name": "text",
"type": "labels",
"value": {
"start": 63,
"end": 66,
"text": "one",
"labels": [
"Percent"
]
}
},
{
"from_name": "label",
"to_name": "text",
"type": "labels",
"value": {
"start": 95,
"end": 100,
"text": "Sunny",
"labels": [
"Person"
]
}
},
{
"from_name": "label",
"to_name": "text",
"type": "labels",
"value": {
"start": 161,
"end": 169,
"text": "hundreds",
"labels": [
"Percent"
]
}
},
{
"from_name": "label",
"to_name": "text",
"type": "labels",
"value": {
"start": 197,
"end": 203,
"text": "Niyati",
"labels": [
"Person"
]
}
}
]
},
{
"model_version": "two",
"result": [
{
"from_name": "label",
"to_name": "text",
"type": "labels",
"value": {
"start": 17,
"end": 31,
"text": "several inches",
"labels": [
"Fact"
]
}
},
{
"from_name": "label",
"to_name": "text",
"type": "labels",
"value": {
"start": 63,
"end": 66,
"text": "one",
"labels": [
"Percent"
]
}
},
{
"from_name": "label",
"to_name": "text",
"type": "labels",
"value": {
"start": 95,
"end": 100,
"text": "Sunny",
"labels": [
"Time"
]
}
},
{
"from_name": "label",
"to_name": "text",
"type": "labels",
"value": {
"start": 115,
"end": 121,
"text": "Easter",
"labels": [
"Location"
]
}
},
{
"from_name": "label",
"to_name": "text",
"type": "labels",
"value": {
"start": 161,
"end": 169,
"text": "hundreds",
"labels": [
"Money"
]
}
},
{
"from_name": "label",
"to_name": "text",
"type": "labels",
"value": {
"start": 197,
"end": 203,
"text": "Niyati",
"labels": [
"Person"
]
}
}
]
}
]
}
]
{% endcodeblock %}
{% enddetails %}
Import pre-annotated tasks into Label Studio [using the UI](tasks.html#Import-data-from-the-Label-Studio-UI) or [using the API](/api#operation/projects_import_create).
In the Label Studio UI, the imported prediction for the first task looks like the following:
<center><img src="../images/predictions_loaded_text.png" alt="screenshot of the Label Studio UI showing the text with highlighted text labels and prediction scores visible." style="width: 100%; max-width: 700px"></center>
You can sort the prediction scores for each labeled region using the **Regions** pane options.
## Troubleshoot pre-annotations
If you encounter unexpected behavior after you import pre-annotations into Label Studio, review this guidance to resolve the issues.
### Check the configuration values of your labeling configuration and tasks
The `from_name` of the pre-annotation task JSON must match the value of the name in the `<Labels name="label" toName="text">` portion of the labeling configuration. The `to_name` must match the `toName` value.
In the text example on this page, the JSON includes `"from_name": "label"` to correspond with the `<Labels name="label"` and `"to_name": text` to correspond with the `toName="text` of the labeling configuration. The default template might contain `<Labels name="ner" toName="text">`. To work with this example JSON, you need to update the values to match.
In the image example on this page, the XML includes
```xml
...
<Choices name="choice" toName="image" showInLine="true">`
...
<RectangleLabels name="label" toName="image">
...
```
To correspond with the following portions of the example JSON:
```json
...
"type": "rectanglelabels",
"from_name": "label", "to_name": "image",
...
type": "choices",
"from_name": "choice", "to_name": "image",
...
```
### Check the labels in your configuration and your tasks
Make sure that you have a labeling configuration set up for the labeling interface, and that the labels in your JSON file exactly match the labels in your configuration. If you're using a [tool to transform your model output](https://github.com/heartexlabs/label-studio-transformers), make sure that the labels aren't altered by the tool.

View File

@ -0,0 +1,122 @@
---
title: Review annotations in Label Studio
short: Review annotations
badge: <i class='ent'></i>
type: guide
order: 410
meta_title: Review annotation quality in Label Studio
meta_description: Review the annotations produced by annotators in your Label Studio data labeling projects and evaluate annotator performance against ground truth annotations, predictions, and other annotator's annotations to produce high-quality data for your machine learning models.
---
> Beta documentation: Label Studio Enterprise v2.0.0 is currently in Beta. As a result, this documentation might not reflect the current functionality of the product.
After multiple labelers have annotated tasks, review their output to validate the quality of the results. You can also perform this task after a model has predicted labels for tasks in your dataset.
<div class="enterprise"><p>
The annotation review workflow is only available in Label Studio Enterprise Edition. If you're using Label Studio Community Edition, see <a href="label_studio_compare.html">Label Studio Features</a> to learn more.
</p></div>
## Why review annotations?
Data labeling is a crucial step for training many machine learning models, and it's essential to review annotations to make sure that only the highest quality data is used to train your machine learning model. If you don't review the quality of labeled data, weak annotations might be used when training your model and degrade overall model performance.
## Choose what to review
You can start reviewing tasks randomly, or order tasks in the project data manager in different ways, depending on your use case:
- Order tasks by annotator, to review annotations and assess individual annotator performance at the same time.
- Order tasks by agreement, to review annotations with more uncertainty among annotators first.
- Order tasks by model confidence score, to review the annotations that a machine learning model was less certain about first.
## Review annotated tasks
After you choose what to review, start reviewing annotated tasks:
1. From within a project, click the **Review All Tasks** button. If you select a subset of tasks to review, the number of those tasks appears in the button.
2. Review the first task and annotation. By default, you view the tasks in numeric order. You can see the annotator and their annotation.
- If the annotation is correct, click **Accept**.
- If the annotation is mostly correct, you can correct it by selecting a different option, changing which region is selected, moving the bounding box, or whichever makes sense for the type of labeling you're reviewing. After correcting the annotation, click **Fix & Accept**.
- If the annotation is completely incorrect, or you don't want to attempt to correct it at all, click **Reject** to reject the annotation. To place a rejected task back in the Label Stream for annotation, you must delete the annotation. Rejecting an annotation does not return it to annotators to re-label.
3. Continue reviewing annotated tasks until you've reviewed all annotated tasks. Click **Data Manager** to return to the list of tasks for the project.
If there are multiple annotations, you can select the tab of each annotation by annotator and result ID to view them separately. The annotation result ID is different from the task ID visible in the left menu. To see annotations side-by-side, you can click the task in the Data Manager and view a grid of annotations in the task preview mode.
### Assign reviewers to tasks
You can assign reviewers to tasks, or people with access can review tasks on an ad hoc basis. Anyone who is assigned to a task or who completes a review of a task appears in the Reviewers column on the Data Manager. You can assign reviewers to multiple tasks at once, but you cannot remove reviewers from multiple tasks at once.
1. For a specific project, select tasks on the Data Manager.
2. Select the dropdown and choose **Assign Reviewers**.
3. Select names of reviewers and click the `>` arrow to assign them to the selected tasks.
4. Click **Assign**.
## Verify model and annotator performance
Use the project dashboard to verify annotator performance. For a project, click **Dashboard** to view the dashboard.
If you don't see an annotator's activity reflected on the dashboard, make sure they have been added as a member to the project.
### Review dataset progress
The dataset progress displays the number of tasks considered to be fully annotated for the project. If the project requires a minimum annotation per task of more than one, some tasks might not appear as "annotated" because they are not yet fully annotated by the project standards.
You can review how many tasks are left to be completed by annotators, how many tasks have been skipped, and how many tasks have been reviewed.
### Review annotator performance
For each project, you can review the project dashboard and review the Annotator Performance section to learn more about the annotators and their annotations, as well as overall annotator consensus.
Discover how many annotators have worked on the project, and how many hours they cumulatively spent labeling. You can also see the total number of annotations produced by the annotators, separate from the total number of tasks in the project.
Review a table to see the following for each annotator:
- The total agreement is for one annotator with all other annotators
- The number of tasks that they have finished annotating.
- The number of tasks that they have skipped.
- The reviewing outcome for the annotations they performed.
- The total annotation progress across all tasks.
- The mean time to annotate tasks.
- The agreement of their annotations with the ground truth annotations, if there are any.
- The agreement of their annotations with the predicted annotations, if there are any.
### Review annotator agreement
You can also review the overall annotator agreement on a more individual basis with the annotator agreement matrix.
Review the annotator agreement matrix to understand which annotator's annotations consistently agree with or don't agree with other annotator's annotations.
To see the specific annotations contributing to the agreement, do the following:
1. Open the data manager for the project.
2. Locate a task annotated by the different annotators that you want to compare.
3. Click the task to open the task preview.
4. Click each annotation tab to compare how the different annotations differ. The initials of each annotator appears in the tab header with the annotation ID.
## Review annotations against ground truth annotations
Define ground truth annotations in a Label Studio project. Use ground truth annotations to assess the quality of your annotated dataset. Review ground truths to make sure that annotators are accurately labeling data at the start of the project, and continually throughout the lifecycle of the training dataset creation.
Label Studio Enterprise compares annotations from annotators and model predictions against the ground truth annotations for a task to calculate an accuracy score between 0 and 1.
> Ground truth annotations are only available in Label Studio Enterprise Edition. If you're using Label Studio Community Edition, see [Label Studio Features](label_studio_compare.html) to learn more.
## Define ground truth annotations for a project
You can define ground truth annotations from a project's Data Manager page:
1. When viewing the data manager for a project, select the checkboxes next to annotated tasks.
2. In the selected tasks dropdown menu, select **Assign ground truths**. If there are multiple annotations for a task, only the first, or earliest annotation is assigned as a ground truth.
3. Confirm that you want to set the selected task annotations as ground truths.
You can also assign ground truths when you annotate a task.
1. When labeling a task, create an annotation or select an existing one.
2. Click the star icon to label the annotation as a ground truth.
## Manage ground truth annotations for a project
Review and modify the ground truth annotations for a project.
### Review existing ground truth annotations
You can filter the Data Manager to show only tasks with ground truth annotations so that you can review them.
### Remove ground truth annotations
To remove ground truth annotations,
1. When viewing the data manager for a project, select the checkboxes next to annotated tasks.
2. In the selected tasks dropdown menu, select **Delete ground truths**. This does not delete the annotation, but changes the status of the ground truth setting for the annotation to false.
You can also remove ground truths when you annotate a task.
1. When labeling a task, create an annotation or select an existing one.
2. Click the star icon to label the annotation as a ground truth.

View File

@ -0,0 +1,68 @@
---
title: Secure Label Studio
type: guide
order: 220
meta_title: Secure Label Studio
meta_description: About the security and hardening processes used by Label Studio Community and Enterprise Editions, and how you can configure your data labeling project to be more secure.
---
> Beta documentation: Label Studio Enterprise v2.0.0 is currently in Beta. As a result, this documentation might not reflect the current functionality of the product.
Label Studio provides many ways to secure access to your data and your deployment architecture.
All application component interactions are encrypted using the TLS protocol.
<div class="enterprise"><p>
<img src="/images/LSE/en.svg" width=64 height=16 alt="Enterprise" style="vertical-align:middle"/> Role-based access control is only available in Label Studio Enterprise deployments. Label Studio Enterprise is available as on-premises software that you manage, or as a Software-as-a-Service (SaaS) offering.
</p></div>
<!--If you need to meet strong privacy regulations, legal requirements, or you want to make a custom installation within your infrastructure or any public cloud (AWS, Google, Azure, etc.), Label Studio Enterprise works on-premises. It is a self-contained version (no Internet connection is required) of the Platform, no data will leave your infrastructure. To make the installation the most accessible, we offer a Docker image.-->
If you're running the open source version in production, restrict access to the Label Studio server. Label Studio establishes secure connections to the web application by enforcing HTTPS and secured cookies. Restrict access to the server itself by opening only the [required ports](install.html#Port_requirements) on the server.
## Secure user access to Label Studio
Secure user access to Label Studio to protect data integrity and allow changes to be performed only by those with access to the system.
Each user must create an account with a password of at least 8 characters, allowing you to track who has access to Label Studio and which actions they perform.
You can restrict signup to only those with a link to the signup page, and the invitation link to the signup page can be reset. See [Set up user accounts for Label Studio](signup.html) for more.
<i class='ent'></i> If you're using Label Studio Enterprise, you can further secure user access in many ways:
- Assign specific roles to specific user accounts to set up role-based access control. For more about the different roles and permissions in Label Studio Enterprise, see [Manage access to Label Studio](manage_users.html).
- Set up organizations, workspaces, and projects to separate projects and data across different groups of users. Users in one organization cannot see the workspaces or projects in other organizations. For more about how to use organizations, workspaces, and projects to secure access, see [Organize projects in Label Studio](organize_projects.html).
## Secure API access to Label Studio
Access to the REST API is restricted by user role and requires an access token that is specific to a user account. Access tokens can be reset at any time from the Label Studio UI or using the API.
## Secure access to data in Label Studio
Data in Label Studio is stored in one or two places, depending on your deployment configuration.
- Project settings and configuration details are stored in a SQLite or PostgreSQL database.
- Project data and annotations can be stored in the SQLite or PostgreSQL database, or stored in a local file directory, a Redis database, or cloud storage buckets on Amazon Web Services (AWS), Google Cloud Platform (GCP), or Microsoft Azure. Data stored in external storage is accessed by Label Studio using URLs, and the data is not stored in Label Studio directly.
### Secure database access
Label Studio does not permit direct access to the SQLite or PostgreSQL databases from the app to prevent SQL injection attacks and other data exfiltration attempts.
Instead, the app uses URIs to access the data stored in the database. These URIs can only be accessed by the Label Studio labeling interface and API because the requests to retrieve the data using those URIs are verified and proxied by Basic Authentication headers.
All specific object properties that are exposed with a REST API are added to an allowlist. The API endpoints can only be accessed with specific HTTP verbs and must be accessed by browser-based clients that implement a proper Cross-Origin Resource Sharing (CORS) policy. API tokens are user-specific and can be reset at any time.
The PostgreSQL database has SSL mode enabled and requires valid certificates.
### Secure access to cloud storage
When using Label Studio, users don't have direct access to cloud storage. Objects are retrieved from and stored in cloud storage buckets according to the [cloud storage settings](storage.html) for each project.
Label Studio accesses the data stored in remote cloud storage using URLs, so place the data in cloud storage buckets near where your team works, rather than near where you host Label Studio.
Use workspaces, projects, and roles to further secure access to cloud storage and data accessed using URLs by setting up cloud storage credentials. You can provide cloud storage authentication credentials globally for all projects in Label Studio, or use different credentials for access to different buckets on a per-project basis. Label Studio allows you to configure different cloud storage buckets for different projects, making it easier to manage access to the data. See [Sync data from external storage](storage.html).
In Label Studio Enterprise, if you're using Amazon S3, Label Studio can use an IAM role configured with an external ID to access S3 bucket contents in a secure fashion. See [Set up an S3 connection with IAM role access](storage.html#Set-up-an-S3-connection-with-IAM-role-access)
### Secure access to Redis storage
If you use Redis as an external storage database for data and annotations, the setup supports TLS/SSL and requires the Label Studio client to be authenticated to the database with a valid certificate.
## Audit logging
Label Studio Enterprise automatically logs all user activities so that you can monitor the activities being performed in the application.

View File

@ -0,0 +1,87 @@
---
title: Set up your labeling interface
type: guide
order: 401
meta_title: Set up labeling config interface
meta_description: Customize and configure your data labeling and annotation interface with templates or custom tag combinations in the Label Studio UI for your machine learning and data science projects.
---
All labeling activities in Label Studio occur in the context of a project. After you [create a project](setup_project.html#Create-a-project) and [import data](tasks.html), set up the labeling interface and labeling configuration for your project. This setup process is essential to your labeling project.
## Set up the labeling interface for your project
Configure the labels and task type for annotators using the templates included with Label Studio or by defining your own combination of tags to set up the labeling interface for your project.
1. Select a template from the [available templates](/templates) or customize one.
2. Label Studio automatically selects the field to label based on your data. If needed, modify the selected field.
3. Add label names on new lines.
4. (Optional) Choose new colors for the labels by clicking the label name and choosing a new color using the color selector.
5. Configure additional settings relevant to the labeling interface functionality. For example, when labeling text you might have the option to **Select text by words**.
6. Click **Save**.
### Modify the labeling interface
You can make changes to the labeling interface and configuration in the project settings.
1. In Label Studio UI, open the project you want to modify.
2. Click **Settings**.
3. Click **Labeling Interface**.
4. Browse templates, update the available labels,
> **Note:** After you start to annotate tasks, you cannot remove labels or change the type of labeling being performed, for example by choosing a new template, unless you delete the completed annotations using those labels.
## Customize a template
You can customize a [labeling config template](/templates) or use a custom configuration that you create from scratch. If you create a custom configuration that might be useful to other Label Studio users, consider [contributing the configuration as a template](https://github.com/heartexlabs/label-studio/tree/master/label_studio/examples).
The labeling configuration for a project is an XML file that contains three types of tags specific to Label Studio.
| Tag type | When to use |
| --- | --- |
| Object | Specify the data type and input data sources from your dataset. |
| Control | Configure how the annotation results appear. |
| Visual | Define how the user interface looks for labeling. |
You can combine these tags to create a custom label configuration for your dataset.
<a class="button" href="/tags">See All Available Tags</a>
### Example labeling config
For example, to classify images that are referenced in your data as URLs (`$image_url`) into one of two classes, Cat or Dog, use this example labeling config:
```xml
<View>
<Image name="image_object" value="$image_url"/>
<Choices name="image_classes" toName="image_object">
<Choice value="Cat"/>
<Choice value="Dog"/>
</Choices>
</View>
```
This labeling config references the image resource in the [Image](/tags/image.html) object tag, and specifies the available labels to select in the [Choices](/tags/choices.html) control tag.
If you want to customize this example, such as to allow labelers to select both Cat and Dog labels for a single image, modify the parameters used with the [Choices](/tags/choices.html) control tag:
```xml
<View>
<Image name="image_object" value="$image_url"/>
<Choices name="image_classes" toName="image_object" choice="multiple">
<Choice value="Cat"/>
<Choice value="Dog"/>
</Choices>
</View>
```
## Set up labeling config in other ways
If you want to specify a labeling configuration for your project without using the Label Studio UI, you can use the command line or the API.
### Add a labeling config from the command line
You can define the labeling configuration in a `config.xml` file and initialize a specific project in Label Studio with that file.
```bash
label-studio my_new_project start --label-config config.xml
```
### Add a labeling config with the API
You can configure your labeling configuration with the server API. See the [Backend API](api.html) documentation for more details.

View File

@ -0,0 +1,195 @@
---
title: Set up your labeling project
short: Project setup
type: guide
order: 400
meta_title: Set up your labeling project
meta_description: Set up data labeling and annotation projects in Label Studio to produce high-quality data for your machine learning and data science projects.
---
All labeling activities in Label Studio occur in the context of a project.
After you [start Label Studio](start.html) and [create an account](signup.html), create a project to start labeling your data.
1. [Create a project](#Create-a-project)
2. [Import data](tasks.html).
3. Select a template to configure the labeling interface for your dataset. [Set up the labeling interface for your project](setup.html).
4. (Optional) [Set up annotation settings for your project](#Set-up-annotation-settings-for-your-project).
5. (Optional, Label Studio Enterprise only) [Set up review settings for your project](#Set-up-review-settings-for-your-project).
## Create a project
When you're creating a project, you can save your progress at any time. You don't need to import your data and set up the labeling interface all at the same time, but you can.
1. In the Label Studio UI, click **Create**.
2. Type a project name and a description. If you want, choose a color for your project.
3. If you're ready to import your data, click **Data Import** and import data from the Label Studio UI. For details about import formats and data types, see [Get data into Label Studio](tasks.html).
4. If you're ready to set up the labeling interface, click **Labeling Setup** and choose a template or create a custom configuration for labeling. See [Set up the labeling interface for your project](setup.html).
5. When you're done, click **Save** to save your project.
After you save a project, any other collaborator with access to the Label Studio instance can view your project, perform labeling, and make changes. To use role-based access control, you need to use Label Studio Enterprise Edition.
## <i class='ent'></i> Open a project to annotators
In Label Studio Enterprise, you can hide projects from annotators so that you can fully configure the project before annotators can start labeling. When you're ready for annotators to start labeling, open the project to annotators.
Before you can open a project to annotators, make sure that you've done the following:
- [Set up the labeling interface](setup.html).
- [Imported data](tasks.html).
- [Moved the project to the correct workspace](manage_users.html#Create-workspaces-to-organize-projects), if it was in your private sandbox.
To open the project to annotators, do the following:
1. Open a project and navigate to the project **Dashboard**.
2. Toggle **Open to Annotators** so that the switch is enabled.
3. Then annotators can view the project and start being assigned tasks according to the method that you use to [distribute tasks for labeling](#Set-up-task-distribution-for-labeling).
## Delete tasks or annotations
If you have duplicate tasks, or want to remove annotations, you can delete tasks and annotations from Label Studio.
1. In Label Studio UI, open the project you want to update.
2. Filter the Data Manager page to show only the data you want to delete. For example, specific annotations, or tasks annotated by a specific annotator.
3. Select the checkboxes for the tasks or annotations that you want to delete.
4. Select the dropdown with the number of tasks, and choose **Delete tasks** or **Delete annotations**.
5. Click **Ok** to confirm your action.
If you want to make changes to the labeling interface or perform a different type of data labeling, first select all the annotations for your dataset and delete the annotations.
## Set up annotation settings for your project
Set up annotation settings to configure how you want annotators to perform labeling for your project.
<div class="enterprise"><p>
Some annotation settings are only available in Label Studio Enterprise Edition. If you're using Label Studio Community Edition, see <a href="label_studio_compare.html">Label Studio Features</a> to learn more.
</p></div>
### Set up instructions for data labelers
In the project settings, you can add instructions and choose whether to show the instructions to annotators before they perform labeling.
1. Within a project on the Label Studio UI, click **Settings**.
2. Click **Instructions**, or in Label Studio Enterprise, click **Annotation Settings**.
3. Type instructions and choose whether to show the instructions to annotators before labeling.
4. Click **Save**. <br/>Click the project name to return to the data manager view.
Annotators can view instructions at any time when labeling by clicking the (i) button from the labeling interface.
### <i class='ent'></i> Set up task distribution for labeling
Select how you want to distribute tasks to annotators for labeling. Different from task sampling, use this setting to choose whether you need to assign annotators before they can start labeling.
1. Within a project on the Label Studio UI, click **Settings**.
2. Click **Annotation Settings**.
3. Under **Distribute Labeling Tasks**, select one of the following:
- Auto, the default option, to distribute tasks automatically to annotators.
- Manual, to show tasks to assigned annotators first, then automatically distribute unassigned tasks.
Your changes save automatically.
> You can't assign annotators to tasks unless you select the **Manual** option.
### <i class='ent'></i> Set minimum annotations per task
By default, each task only needs to be annotated by one annotator. If you want multiple annotators to be able to annotate tasks, set the Overlap of Annotations for a project in the project settings.
1. Within a project on the Label Studio UI, click **Settings**.
2. Click **Annotation Settings**.
3. Under **Overlap of Annotations**, select the number of minimum annotations for a task.
4. Choose whether to enforce the overlap for the default of 100% of tasks, or a smaller percentage.
5. Choose whether to show tasks that require multiple annotations, **tasks with overlap**, before other tasks that need to be annotated.
6. Your changes save automatically. Return to the **Data Manager** and assign annotators to the tasks so that they can annotate the tasks.
#### How task overlap works
For example, if you want all tasks to be annotated by at least 2 annotators:
- Set the minimum number of annotations to **2**
- Enforce the overlap for 100% of tasks.
If you want at least half of the tasks to be annotated by at least 3 people:
- Set the minimum number of annotations to **3**
- Enforce the overlap for 50% of tasks.
If you're using manual distribution of tasks, annotators with tasks assigned to them label those tasks first, then Label Studio automatically distributes the remaining tasks to the project annotators so that the desired overlap and minimum number of annotations per task can be achieved.
### Set annotating options
If you want, you can allow empty annotations.
1. Within a project on the Label Studio UI, click **Settings**.
2. Click **Annotation Settings**.
3. Under **Annotating Options**, select **Allow empty annotations**. By default, empty annotations are allowed.
### Set up task sampling
If you're using Label Studio Community Edition, you must set up task sampling when you start Label Studio. See [Set up task sampling for your project](start.html#Set-up-task-sampling-for-your-project).
<i class='ent'></i> In Label Studio Enterprise, you can set up task sampling in the annotation settings for a project.
1. Within a project on the Label Studio UI, click **Settings**.
2. Click **Annotation Settings**.
3. Select your preferred method of task sampling:
- Uncertainty sampling, where tasks are shown to annotators according to the model uncertainty, or prediction scores.
- Sequential sampling, the default, where tasks are shown to annotators in the same order that they appear on the Data Manager.
- Uniform sampling, where tasks are shown to annotators in a random order.
4. You can also choose whether to show tasks with ground truth labels first.
Your changes save automatically.
### <i class='ent'></i> Define the matching function for annotation statistics
Annotation statistics such as annotator consensus are calculated using a matching score. If you want the matching score to calculate matches by requiring exact matching choices, choose that option in the annotation settings.
1. Within a project on the Label Studio UI, click **Settings**.
2. Click **Annotation Settings**.
3. Under **Matching Function**, select **Exact matching choices**.
Your changes save automatically. For more about how annotation statistics are calculated in Label Studio Enterprise, see [Task agreement and annotator consensus in Label Studio](stats.html).
## <i class='ent'></i> Set up review settings for your project
Set up review settings to guide reviewers when they review annotated tasks.
<div class="enterprise"><p>
Review settings and the review stream are only available in Label Studio Enterprise Edition. If you're using Label Studio Community Edition, see <a href="label_studio_compare.html">Label Studio Features</a> to learn more.
</p></div>
### Set up instructions for task reviewers
In the project settings, you can add instructions and choose whether to show the instructions to reviewers before they start reviewing annotated tasks.
1. Within a project on the Label Studio UI, click **Settings**.
2. Click **Review Settings**.
3. Type instructions and choose whether to show the instructions to reviewers before reviewing annotated tasks.
4. Click **Save**. <br/>Click **Data Manager** to return to the data manager view.
### Set reviewing options
Configure the reviewing settings for your project.
1. Within a project on the Label Studio UI, click **Settings**.
2. Click **Review Settings**.
3. Under **Reviewing Options**, choose whether to mark a task as reviewed if at least one annotation has been reviewed, or only after all annotations for a task have been processed.
4. Under **Reviewing Options**, choose whether to anonymize annotators when reviewing tasks.
Your changes save automatically.
## Where Label Studio stores your project data and configurations
All labeling activities in Label Studio occur in the context of a project.
Starting in version 1.0.0, Label Studio stores your project data and configurations in a SQLite database. You can choose to use PostgreSQL or Redis instead. See [Setup database storage](storedata.html).
In versions of Label Studio earlier than 1.0.0, when you start Label Studio for the first time, it launches from a project directory that Label Studio creates, called `./my_project` by default.
`label-studio start ./my_project --init`
### Project directory structure
In versions of Label Studio earlier than 1.0.0, the project directory is structured as follows:
```
├── my_project
│ ├── config.json // project settings
│ ├── tasks.json // all imported tasks in a JSON dictionary: {task_id: task}
│ ├── config.xml // labeling config for the current project
│ ├── completions // directory with all completed annotations stored in one file for each task_id
│ │ ├── <task_id>.json
│ ├── export // stores archives with all results exported from Label Studio UI
│ │ ├── 2020-03-06-15-23-47.zip
```
> Warning: Modifying any of the internal project files is not recommended and can lead to unexpected behavior. Use the Label Studio UI or command line arguments (run `label-studio start --help`) to import tasks, export completed annotations, or to change label configurations.

118
docs/source/guide/signup.md Normal file
View File

@ -0,0 +1,118 @@
---
title: Set up user accounts
type: guide
order: 250
meta_title: User Accounts
meta_description: Sign up for Label Studio and invite users to collaborate on your data labeling, machine learning, and data science projects.
---
Sign up and create an account for Label Studio to start labeling data and setting up projects.
Everyone with an account in Label Studio has access to the same functionality. If you're using Label Studio Enterprise, see [Manage access to Label Studio](manage_users.html) for details about what role-based access control is available.
## Create an account
When you first [start Label Studio](start.html), you see the sign up screen.
1. Create an account with your email address and a password.
2. Log in to Label Studio.
Accounts that you create are stored locally on the Label Studio server and allow multiple annotators to collaborate on a specific data labeling project.
If you want, you can create an account from the command line when you start Label Studio.
```bash
label-studio start --username <username> --password <password> [--user-token <token-at-least-5-chars>]
```
> Note: The `--user-token` argument is optional. If you don't set the user token, one is automatically generated for the user. Use the user token for API access. The minimum token length is 5 characters.
### Retrieve user info from the command line
You can retrieve information about a user, including the API user token for a user, from the command line after starting Label Studio.
From the command line, run the following:
```bash
label-studio user --username <username>
```
You can see user info as the last line of the response. For example:
```
=> User info:
{'id': 1, 'first_name': 'User', 'last_name': 'Somebody', 'username': 'label-studio', 'email': 'example@labelstud.io', 'last_activity': '2021-06-15T19:37:29.594618Z', 'avatar': '/data/avatars/071280b8-48ACD59200000578-5322459-image-m-23_1517162202847.jpg', 'initials': 'el', 'phone': '', 'active_organization': 1, 'token': '1bc2c33cb44e56cb9f1e191238ffb78564675faa', 'status': 'ok'}
```
You can use the output to retrieve the token for a user and use the token to call the API. You can also retrieve the user token from the Label Studio UI. See more in the [Label Studio API documentation](api.html).
### Restrict signup for local deployments
To restrict who has access to your Label Studio instance, invite collaborators directly using an invitation link.
To disable the signup page unless someone uses the invitation link, do the following from the command line after installing Label Studio:
```bash
export LABEL_STUDIO_DISABLE_SIGNUP_WITHOUT_LINK=true
```
You can then start Label Studio and create an account for yourself to use to log into Label Studio:
```bash
label-studio start --username <username> --password <password>
```
After you log into Label Studio, you can start [inviting collaborators](#Invite-collaborators-to-a-project).
### Restrict signup for cloud deployments
To restrict signup to only those with a link on cloud deployments, set the following environment variables after you install but before you start Label Studio:
```
LABEL_STUDIO_DISABLE_SIGNUP_WITHOUT_LINK=true
LABEL_STUDIO_USERNAME=<username>
LABEL_STUDIO_PASSWORD=<password>
# token is optional, it is generated automatically if not set
LABEL_STUDIO_USER_TOKEN=<token-at-least-5-chars>
```
Then, start Label Studio and log in with the username and password that you set as environment variables and start [inviting collaborators](#Invite-collaborators-to-a-project).
## Invite collaborators to a project
After you [set up a labeling project](setup.html), invite annotators to the project to start collaborating on labeling tasks. Inviting people to your Label Studio instance with a link does not restrict access to the signup page unless you also set an environment variable. See how to [Restrict signup for local deployments](#Restrict-signup-for-local-deployments) and [Restrict signup for cloud deployments](#Restrict-signup-for-cloud-deployments) on this page.
1. In the Label Studio UI, click the hamburger icon and click **People**.
2. Click **+ Add People**.
3. Copy the invitation link and share it with those that you want to invite to Label Studio. If you need to update the link and deactivate the old one, return to this page and click **Reset Link**. The link only resets if the signup page is also disabled.
## Manage your account in Label Studio
After you create an account in Label Studio, you can make changes to it as needed.
1. From the Label Studio UI, click the user icon in the upper right.
2. Click **Account & Settings**.
3. Update your display name and add a profile picture no larger than 512 x 512 pixels.
4. Click **Save**.
## Review existing accounts in Label Studio
You can review the existing accounts in Label Studio to see which people created which projects, and to which projects they contributed annotations.
1. From the Label Studio UI, click the hamburger icon and click **People**.
2. Review the list of users by email address and name. You can see the last time a user was active in Label Studio.
3. Click a row to see additional detail about a specific user, including the projects that they created or contributed annotations to.
### Reset password
If you forget your password or change passwords regularly for security reasons, you can change it from the command line.
1. On the server running Label Studio, run the following command:
```bash
label-studio reset_password
```
2. When prompted, type the username and the new password. You see `Password successfully changed`.
You can also use optional command line arguments to reset the password for a username.
- Specify the username and type the password when prompted:
```bash
label-studio reset_password --username <username>
New password:
```
- Specify both the username and the password:
```bash
label-studio reset_password --username <username> --password <password>
```

203
docs/source/guide/start.md Normal file
View File

@ -0,0 +1,203 @@
---
title: Start Label Studio
type: guide
order: 206
meta_title: Start Commands for Label Studio
meta_description: Documentation for starting Label Studio and configuring the environment to use Label Studio with your machine learning or data science project.
---
After you install Label Studio, start the server to start using it.
```bash
label-studio start
```
By default, Label Studio starts with a SQLite database to store labeling tasks and annotations. You can specify different source and target storage for labeling tasks and annotations using Label Studio UI or the API. See [Database storage](storedata.html) for more.
## Command line arguments for starting Label Studio
You can specify a machine learning backend and other options using the command line interface. Run `label-studio --help` to see all available options, or refer to the following tables.
Some available commands for Label Studio provide information or start the Label Studio server:
| Command | Description |
| --- | ---- |
| `label-studio` | Start the Label Studio server. |
| `label-studio -h` `label-studio --help` | Display available command line arguments. |
| `label-studio init <project_name> <optional_arguments>` | Initialize a specific project in Label Studio. |
| `label-studio start <project_name> --init <optional_arguments>` | Start the Label Studio server and initiliaze a specific project. |
| `label-studio reset_password` | Reset the password for a specific Label Studio username. See [Create user accounts for Label Studio](signup.html). |
| `label-studio shell` | Get access to a shell for Label Studio to manipulate data directly. See documentation for the Django [shell-plus command](https://django-extensions.readthedocs.io/en/latest/shell_plus.html). |
| `label-studio version` | Show the version of Label Studio and then terminates.
| `label-studio user --username <email>` | Show the user info with token.
The following command line arguments are optional and must be specified with `label-studio start <argument> <value>` or as an environment variable when you set up the environment to host Label Studio:
| Command line argument | Environment variable | Default | Description |
| --- | ---- | --- | ---- |
| `-b`, `--no-browser` | N/A | `False` | Do not automatically open a web browser when starting Label Studio. |
| `-db` `--database` | `LABEL_STUDIO_DATABASE` | `label_studio.sqlite3` | Specify the database file path for storing labeling tasks and annotations. See [Database storage](install.html#Database_storage). |
| `--data-dir` | `LABEL_STUDIO_BASE_DATA_DIR` | OS-specific | Directory to use to store all application-related data. |
| `-d` `--debug` | N/A | `False` | Enable debug mode for troubleshooting Label Studio. |
| `-c` `--config` | `CONFIG_PATH` | `default_config.json` | Deprecated, do not use. Specify the path to the server configuration for Label Studio. |
| `-l` `--label-config` | `LABEL_STUDIO_LABEL_CONFIG` | `None` | Path to the label configuration file for a specific Label Studio project. See [Set up your labeling project](setup.html). |
| `--ml-backends` | `LABEL_STUDIO_ML_BACKENDS` | `None` | Command line argument is deprecated. Specify the URLs for one or more machine learning backends. See [Set up machine learning with your labeling process](ml.html). |
| `--sampling` | N/A | `sequential` | Specify one of sequential or uniform to define the order for labeling tasks. See [Set up task sampling for your project](start.html#Set_up_task_sampling_for_your_project) on this page. |
| `--log-level` | N/A | `ERROR` | One of DEBUG, INFO, WARNING, or ERROR. Use to specify the logging level for the Label Studio server. |
| `-p` `--port` | `LABEL_STUDIO_PORT` | `8080` | Specify the web server port for Label Studio. Defaults to 8080. See [Run Label Studio on localhost with a different port](start.html#Run-Label-Studio-on-localhost-with-a-different-port) on this page. |
| `--host` | `LABEL_STUDIO_HOST` | `''` | Specify the hostname to use to generate links for imported labeling tasks or static loading requirements. Leave empty to make all paths relative to the root domain. For example, specify `"https://77.42.77.42:1234"` or `"http://ls.example.com/subdomain/"`. See [Run Label Studio with an external domain name](start.html#Run-Label-Studio-with-an-external-domain-name) on this page. |
| `--cert` | `LABEL_STUDIO_CERT_FILE` | `None` | Deprecated, do not use. Certificate file to use to access Label Studio over HTTPS. Must be in PEM format. See [Run Label Studio with HTTPS](start.html#Run-Label-Studio-with-HTTPS) on this page. |
| `--key` | `LABEL_STUDIO_KEY_FILE` | `None` | Deprecated, do not use. Private key file for HTTPS connection. Must be in PEM format. See [Run Label Studio with HTTPS](start.html#Run-Label-Studio-with-HTTPS) on this page. |
| `--initial-project-description` | `LABEL_STUDIO_PROJECT_DESC` | `''` | Specify a project description for a Label Studio project. See [Set up your labeling project](setup.html). |
| `--password` | `LABEL_STUDIO_PASSWORD` | `None` | Password to use for the default user. See [Set up user accounts](signup.html). |
| `--username` | `LABEL_STUDIO_USERNAME` | `default_user@localhost` | Username to use for the default user. See [Set up user accounts](signup.html). |
| `--user-token` | LABEL_STUDIO_USER_TOKEN | Automatically generated. | Authentication token for a user to use for the API. Must be set with a username, otherwise automatically generated. See [Set up user accounts](signup.html).
| `--agree-fix-sqlite` | N/A | `False` | Automatically agree to let Label Studio fix SQLite issues when using Python 3.63.8 on Windows operating systems. |
| N/A | `LABEL_STUDIO_LOCAL_FILES_SERVING_ENABLED` | `False` | Allow Label Studio to access local file directories to import storage. See [Run Label Studio on Docker and use local storage](start.html#Run_Label_Studio_on_Docker_and_use_local_storage). |
| N/A | `LABEL_STUDIO_LOCAL_FILES_DOCUMENT_ROOT` | `/` | Specify the root directory for Label Studio to use when accessing local file directories. See [Run Label Studio on Docker and use local storage](start.html#Run_Label_Studio_on_Docker_and_use_local_storage). |
### Set environment variables
How you set environment variables depends on the operating system and the environment in which you deploy Label Studio.
In *nix operating systems, you can set environment variables from the command line or with an environment setup file. For example:
```bash
export LABEL_STUDIO_LOCAL_FILES_SERVING_ENABLED=true
```
You can also use an `.env` file.
On Windows, you can use the following syntax:
```bash
set LABEL_STUDIO_LOCAL_FILES_SERVING_ENABLED=true
```
To check if you set an environment variable successfully, run the following on *nix operating systems:
```bash
echo $LABEL_STUDIO_LOCAL_FILES_SERVING_ENABLED
```
Or the following on Windows operating systems:
```bash
echo %LABEL_STUDIO_LOCAL_FILES_SERVING_ENABLED%
```
## Run Label Studio on localhost with a different port
By default, Label Studio runs on port 8080. If that port is already in use or if you want to specify a different port, start Label Studio with the following command:
```bash
label-studio start --port <port>
```
For example, start Label Studio on port 9001:
```bash
label-studio start --port 9001
```
Or, set the following environment variable:
```
LABEL_STUDIO_PORT = 9001
```
## Run Label Studio on Docker with a different port
To run Label Studio on Docker with a port other than the default of 8080, use the port argument when starting Label Studio on Docker. For example, to start Label Studio in a Docker container accessible with port 9001, run the following:
```bash
docker run -it -p 9001:8080 -v `pwd`/mydata:/label-studio/data heartexlabs/label-studio:latest label-studio
```
Or, if you're using Docker Compose, update the `docker-compose.yml` file that you're using to expose a different port for the NGINX server used to proxy the connection to Label Studio. For example, this portion of the [`docker-compose.yml`](https://github.com/heartexlabs/label-studio/blob/master/docker-compose.yml) file exposes port 9001 instead of port 80 for proxying Label Studio:
```
...
nginx:
image: nginx:latest
ports:
- 9001:80
depends_on:
- app
...
```
## Run Label Studio on Docker with a host and sub-path
To run Label Studio on Docker with a host and sub-path, you can refer to the example `deploy/dockerfiles/subpath.example.yml` Docker YAML file. To customize it for your environment, manually modify the `nginx/subpath.example.simple.conf` NGINX configuration file, the relevant Label Studio environment variables, and the PostgreSQL database settings for your environment.
## Run Label Studio on Docker and use local storage
To run Label Studio on Docker and reference persistent local storage directories, mount those directories as volumes when you start Label Studio and specify any environment variables you need.
The following command starts a Docker container with the latest image of Label Studio with port 8080 and an environment variable that allows Label Studio to access local files. In this example, a local directory `./myfiles` is mounted to the `/label-studio/files` location.
```bash
docker run -it -p 8080:8080 -v `pwd`/mydata:/label-studio/data \
--env LABEL_STUDIO_LOCAL_FILES_SERVING_ENABLED=true \
--env LABEL_STUDIO_LOCAL_FILES_DOCUMENT_ROOT=/label-studio/files \
-v `pwd`/myfiles:/label-studio/files \
heartexlabs/label-studio:latest label-studio
```
By specifying the environment variable `LABEL_STUDIO_LOCAL_FILES_DOCUMENT_ROOT=/label-studio/files`, Label Studio only scans this directory for local files. It's highly recommended to explicitly specify a `LABEL_STUDIO_LOCAL_FILES_DOCUMENT_ROOT` path to secure the volume access from the Docker container to your local machine.
Place files in the specified source directory (`./myfiles` in this example) and reference that directory when you set up [local storage](storage.html#Local_storage). For more information about using volume mounts in Docker, see the [Docker documentation on volumes](https://docs.docker.com/storage/volumes/).
If you're using Docker Compose, specify the volumes in the Docker Compose YAML file and add the relevant environment variables to the app container. For more about specifying volumes in Docker Compose, see the volumes section of the [Docker Compose file documentation](https://docs.docker.com/compose/compose-file/compose-file-v3/#volumes).
## Run Label Studio with HTTPS
To run Label Studio with HTTPS and access the web server using HTTPS in the browser, use NGINX or another web server to run HTTPS for Label Studio.
## Run Label Studio on the cloud using Heroku
To run Label Studio on the cloud using Heroku, specify an environment variable so that Label Studio loads.
```
LABEL_STUDIO_HOST
```
If you want, you can specify a different hostname for Label Studio, but you don't need to.
To run Label Studio with Heroku and use PostgreSQL as the [database storage](storedata.html), specify the PostgreSQL environment variables required as part of the Heroku environment variable `DATABASE_URL`. For example, to specify a PostgreSQL database hosted on Amazon:
```
DATABASE_URL = postgres://username:password@hostname.compute.amazonaws.com:5432/dbname
```
Then you can specify the required environment variables for a PostgreSQL connection as config variables. See [Database storage](storedata.html).
<!--
## Run Label Studio on the cloud using a different cloud provider
To run Label Studio on the cloud using a cloud provider such as Google Cloud Services (GCS), Amazon Web Services (AWS), or Microsoft Azure,
-->
## Run Label Studio with an external domain name
If you want multiple people to collaborate on a project, you might want to run Label Studio with an external domain name.
To do that, use the `host` parameter when you start Label Studio. These parameters ensure that the correct URLs are created when importing resource files (images, audio, etc) and generating labeling tasks.
There are several possible ways to run Label Studio with an external domain name.
- Replace the `host` parameter in the file which you specified with `--config` option. If you don't use `--config` then edit `label_studio/utils/schema/default_config.json` in the Label Studio package directory.
- Specify the parameters when you start Label Studio: `label-studio start --host http://your.domain.com/ls-root`.
- Specify the parameters as environment variables `HOST` especially when setting up Docker: `HOST=https://your.domain.com:7777`.
Or, you can use environment variables:
```
LABEL_STUDIO_HOST = https://subdomain.example.com:7777
```
You must specify the protocol for the domain name: `http://` or `https://`
If your external host has a port, specify the port as part of the host name.
## Set up task sampling for your project
When you start Label Studio, you can control the order in which tasks are exposed to annotators for a specific project.
For example, to create a project with sequential task ordering for annotators:
```bash
label-studio start <project_name> --sampling sequential
```
The following table lists the available sampling options:
| Option | Description |
| --- | --- |
| sequential | Default. Tasks are shown to annotators in ascending order by the `id` field. |
| uniform | Tasks are sampled with equal probabilities. |
| prediction-score-min | Tasks with the minimum average prediction score are shown to annotators. To use this option, you must also include predictions data in the task data that you import into Label Studio. |
You can also use the API to set up sampling for a specific project. Send a PATCH request to the `/api/projects/<project_id>` endpoint to set sampling for the specified project. See the [API reference for projects](/api#operation/projects_partial_update).
Individual annotators can also control the order in which they label tasks by adjusting the filtering and ordering of labeling tasks in the Label Studio UI. See [Set up your labeling project](setup.html).

120
docs/source/guide/stats.md Normal file
View File

@ -0,0 +1,120 @@
---
title: Annotation statistics
short: Annotation statistics
badge: <i class='ent'></i>
type: guide
order: 413
meta_title: Data Labeling Statistics
meta_description: Label Studio Enterprise documentation about task agreement, annotator consensus, and other data annotation statistics for data labeling and machine learning projects.
---
> Beta documentation: Label Studio Enterprise v2.0.0 is currently in Beta. As a result, this documentation might not reflect the current functionality of the product.
<div class="enterprise"><p>
Label Studio Enterprise Edition includes various annotation and labeling statistics. The open source Community Edition of Label Studio does not perform these statistical calculations. If you're using Label Studio Community Edition, see <a href="label_studio_compare.html">Label Studio Features</a> to learn more.
</p></div>
## Task agreement
Task agreement shows the consensus between multiple annotators when labeling the same task. There are several types of task agreement in Label Studio Enterprise:
- a per-task agreement score, visible on the Data Manager page for a project. This displays how well the annotations on a particular task match across annotators.
- an inter-annotator agreement matrix, visible on the Dashboard for a project. This displays how well the annotations from specific annotators agree with each other in general, or for specific tasks.
You can also see how the annotations from a specific annotator compare to the prediction scores for a task, or how they compare to the ground truth labels for a task.
## Matching score
A matching score assesses the similarity of annotations for a specific task. The matching score is used differently depending on which agreement metrics are being calculated.
Matching scores are used to determine whether two given annotations for a task, represented by `x` and `y` in this example, match.
- If both `x` and `y` are empty annotations for a task, the matching score is `1`.
- If `x` and `y` share no similar points, the matching score is `0`.
- If there are different labeling types used in the annotations in `x` and/or `y`, partial matching scores for each data labeling type are averaged.
- For categorical task labeling, such as those using the Choices tag, Cohen's Kappa index is computed if specified in the project settings.
The type of data labeling being performed affects how the matching score is computed. The following examples describe how the matching scores for various labeling configuration tags are computed.
### Choices
For data labeling where annotators select a choice, the matching score for two given task annotations `x` and `y` is computed like follows:
- If `x` and `y` are the same choice, the matching score is `1`.
- If `x` and `y` are different choices, the matching score is `0`.
### TextArea
For data labeling where annotators transcribe text in a text area, the resulting annotations contain a list of text. The matching score for two given task annotations `x` and `y` is computed like follows:
- The list of text items in each annotation is indexed, such that `x = [x1, x2, ..., xn]` and similarly, `y = [y1, y2, ..., yn]`.
- For each aligned pair of text items across the two annotations `(x1, y1)` the similarity of the text is calculated.
- For each unaligned pair, for example, when one list of text is longer than the other, the similarity is zero.
- The similarity scores are averaged across all pairs, and the result is the matching score for the task.
The matching score for each aligned pair can be calculated in multiple ways:
- Using an [edit distance algorithm](https://en.wikipedia.org/wiki/Edit_distance)
- Splitting the list by words
- Splitting the list by characters
Decide what method to use to calculate the matching score based on your use case and how important precision is for your data labels.
### Labels
The matching score is calculated by comparing the intersection of annotations over the result spans, normalized by the length of each span. For two given task annotations `x` and `y`, the matching score formula is `m(x, y) = spans(x) ∩ spans(y)`.
### Rating
For data labeling where annotators select a rating, the matching score for two given task annotations `x` and `y` is computed like follows:
- If `x` and `y` are the same rating, the matching score is `1`.
- If `x` and `y` are different ratings, the matching score is `0`.
### Ranker
The matching score is calculated using the mean average precision (mAP) for the annotation results.
### RectangleLabels
The method used to calculate the matching score depends on what choice you select on the project settings page from the following options:
- Intersection over Union (IoU), averaged over all bounding box pairs with the best match.
- Precision computed for some threshold imposed on IoU.
- Recall computed for some threshold imposed on IoU.
- F-score computed for some threshold imposed on IoU.
### PolygonLabels
The method used to calculate the matching score depends on what choice you select on the project settings page from the following options:
- Intersection over Union (IoU), averaged over all polygon pairs with the best match.
- Precision computed for some threshold imposed on IoU.
- Recall computed for some threshold imposed on IoU.
- F-score computed for some threshold imposed on IoU.
## Agreement method
_Agreement method_ defines how [matching scores](stats.html#Matching-score) across all completions for a task are combined to form a single inter-annotator agreement score.
There are several possible methods you can specify on project settings page:
### Complete linkage
Complete linkage task agreement groups annotations so that all the matching scores within a given group are higher than the threshold. The agreement score is the maximum group size divided by the total count of annotations.
Review the diagram for a full explanation:
<div style="text-align:center"><img alt="Diagram showing annotations are collected for each task, matching scores are computed for each pair, and grouping and agreement score calculation happens as detailed in the surrounding text." width=800 height=375 src="/images/LSE/stats-complete-linkage.png"/></div>
### Single linkage
Single linkage task agreement groups annotations so that at least one of the matching scores within a given group is higher than the threshold. The agreement score is the maximum group size divided by the total count of annotations.
Review the diagram for a full explanation:
<div style="text-align:center"><img alt="Diagram showing annotations are collected for each task, matching scores are computed for each pair, and grouping and agreement score calculation happens as detailed in the surrounding text." width=800 height=360 src="/images/LSE/stats-single-linkage.png"/></div>
### No grouping
No grouping task agreement uses the mean average of all inter-annotation matching scores for each annotation pair as the final task agreement score.
Review the diagram for a full explanation:
<div style="text-align:center"><img alt="Diagram showing annotations are collected for each task, matching scores are computed for each pair, the resulting scores are averaged for a task." width=800 height=365 src="/images/LSE/stats-no_grouping.png"/></div>
### Example
One annotation that labels the text span "Excellent tool" as "positive", a second annotation that labels the span "tool" as "positive", and a third annotation that labels the text span "tool" as "negative".
<br/><div style="text-align:center"><img alt="diagram showing example labeling scenario duplicated in surrounding text" width=800 height=100 src="/images/LSE/stats-agreement-example.jpg"/></div>
The matching score for the first two annotations is 50%, based on the intersection of the text spans. The matching score comparing the second annotation with the third annotation is 0%, because the same text span was labeled differently.
The task agreement conditions use a threshold of 40% to group annotations based on the matching score, so the first and second annotations are matched with each other, and the third annotation is considered mismatched. In this case, task agreement exists for 2 of the 3 annotations, so the overall task agreement score is 67%.

View File

@ -0,0 +1,293 @@
---
title: Sync data from external storage
type: guide
order: 302
meta_title: Cloud and External Storage Integration
meta_description: Label Studio Documentation for integrating Amazon AWS S3, Google Cloud Storage, Microsoft Azure, Redis, and local file directories with Label Studio to collect data labeling tasks and sync annotation results into your machine learning pipelines for machine learning and data science projects.
---
Integrate popular cloud and external storage systems with Label Studio to collect new items uploaded to the buckets, containers, databases, or directories and return the annotation results so that you can use them in your machine learning pipelines.
Set up the following cloud and other storage systems with Label Studio:
- [Amazon S3](#Amazon-S3)
- [Google Cloud Storage](#Google-Cloud-Storage)
- [Microsoft Azure Blob storage](#Microsoft-Azure-Blob-storage)
- [Redis database](#Redis-database)
- [Local storage](#Local-storage)
Each source and target storage setup is project-specific. You can connect multiple buckets, containers, databases, or directories as source or target storage for a project.
If you upload new data to a connected cloud storage bucket, sync the storage connection to add the new labeling tasks to Label Studio without restarting.
> Choose your target storage carefully. When you start the labeling project, the target storage must be empty or contain annotations that match previously created or imported tasks from source storage. Tasks are synced with annotations based on internal IDs, so if you accidentally connect to target storage with existing annotations with the same IDs, the connection might fail with undefined behavior.
For details about how Label Studio secures access to cloud storage using workspaces and cloud storage credentials, see [Secure access to cloud storage](#security.html/#Secure-access-to-cloud-storage).
## Amazon S3
To connect your [S3](https://aws.amazon.com/s3) bucket with Label Studio, make sure you have programmatic access enabled. [See the Amazon Boto3 configuration documentation](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/quickstart.html#configuration) for more on how to set up access to your S3 bucket.
### Set up connection in the Label Studio UI
In the Label Studio UI, do the following to set up the connection:
1. Open Label Studio in your web browser.
2. For a specific project, open **Settings > Cloud Storage**.
3. Click **Add Source Storage**.
4. In the dialog box that appears, select **Amazon S3** as the storage type.
5. In the **Storage Title** field, type a name for the storage to appear in the Label Studio UI.
6. Specify the name of the S3 bucket, and if relevant, the bucket prefix to specify an internal folder or container.
7. Adjust the remaining optional parameters:
- In the **File Filter Regex** field, specify a regular expression to filter bucket objects. Use `.*` to collect all objects.
- In the **Region Name** field, specify the AWS region name. For example `us-east-1`.
- In the **S3 Endpoint** field, specify an S3 endpoint.
- In the **Access Key ID** field, specify the access key ID for your AWS account.
- In the **Secret Access Key** field, specify the secret key for your AWS account.
- In the **Session Token** field, specify a session token for your AWS account.
- Enable **Treat every bucket object as a source file** if your bucket contains BLOB storage files such as JPG, MP3, or similar file types. This setting creates a URL for each bucket object to use for labeling. Leave this option disabled if you have multiple JSON files in the bucket with one task per JSON file.
- Choose whether to disable **Use pre-signed URLs**. For example, if you host Label Studio in the same AWS network as your storage buckets, you can disable presigned URLs and have direct access to the storage using `s3://` links.
- Adjust the counter for how many minutes the pre-signed URLs are valid.
8. Click **Add Storage**.
9. Repeat these steps for **Target Storage** to sync completed data annotations to a bucket.
After adding the storage, click **Sync** to collect tasks from the bucket, or make an API call to [sync import storage](/api#operation/api_storages_s3_sync_create).
### <i class='ent'></i> Set up an S3 connection with IAM role access
If you want to use a revocable method to grant Label Studio access to your Amazon S3 bucket, use an IAM role and its temporary security credentials instead of an access key ID and secret. This added layer of security is only available in Label Studio Enterprise. For more details about security in Label Studio and Label Studio Enterprise, see [Secure Label Studio](security.html).
> Beta documentation: Label Studio Enterprise v2.0.0 is currently in Beta. As a result, this documentation might not reflect the current functionality of the product.
#### Set up an IAM role in Amazon AWS
Set up an IAM role in Amazon AWS to use with Label Studio.
1. In the Label Studio UI, open the **Organization** page to get an `External ID` to use for the IAM role creation in Amazon AWS. You must be an administrator to view the Organization page.
2. Follow the [Amazon AWS documentation to create an IAM role](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-user.html) in your AWS account. <br/>Make sure to require an external ID and do not require multi-factor authentication when you set up the role. Select an existing permissions policy, or create one that allows programmatic access to the bucket. Use the external ID when you create a trust policy.
3. After you create the IAM role, note the Amazon Resource Name (ARN) of the role. You need it to set up the S3 source storage in Label Studio.
For more details about using an IAM role with an external ID to provide access to a third party (Label Studio), see the Amazon AWS documentation [How to use an external ID when granting access to your AWS resources to a third party](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-user_externalid.html).
#### Create the connection to S3 in the Label Studio UI
In the Label Studio UI, do the following to set up the connection:
1. Open Label Studio in your web browser.
2. For a specific project, open **Settings > Cloud Storage**.
3. Click **Add Source Storage**.
4. In the dialog box that appears, select **Amazon S3 (IAM role access)** as the storage type.
5. In the **Storage Title** field, type a name for the storage to appear in the Label Studio UI.
6. Specify the name of the S3 bucket, and if relevant, the bucket prefix to specify an internal folder or container.
7. Adjust the remaining parameters:
- In the **File Filter Regex** field, specify a regular expression to filter bucket objects. Use `.*` to collect all objects.
- In the **Region Name** field, specify the AWS region name. For example `us-east-1`.
- In the **S3 Endpoint** field, specify an S3 endpoint.
- In the **Role ARN** field, specify the Amazon Resource Name (ARN) of the IAM role that you created to grant access to Label Studio.
- In the **External ID** field, specify the external ID that identifies Label Studio to your AWS account. You can find the external ID on your **Organization** page.
- Enable **Treat every bucket object as a source file** if your bucket contains BLOB storage files such as JPG, MP3, or similar file types. This setting creates a URL for each bucket object to use for labeling. Leave this option disabled if you have multiple JSON files in the bucket with one task per JSON file.
- Choose whether to disable **Use pre-signed URLs**. For example, if you host Label Studio in the same AWS network as your storage buckets, you can disable presigned URLs and have direct access to the storage using `s3://` links.
- Adjust the counter for how many minutes the pre-signed URLs are valid.
8. Click **Add Storage**.
9. Repeat these steps for **Target Storage** to sync completed data annotations to a bucket.
After adding the storage, click **Sync** to collect tasks from the bucket, or make an API call to [sync import storage](/api#operation/api_storages_s3_sync_create).
### Add storage with the Label Studio API
You can also create a storage connection using the Label Studio API.
- See [Create new import storage](/api#operation/api_storages_s3_create).
- See [Create export storage](/api#operation/api_storages_export_s3_create).
## Google Cloud Storage
Dynamically import tasks and export annotations to Google Cloud Storage (GCS) buckets in Label Studio.
### Prerequisites
To connect your [GCS](https://cloud.google.com/storage) bucket with Label Studio, set up the following:
- **Enable programmatic access to your bucket.** See [Cloud Storage Client Libraries](https://cloud.google.com/storage/docs/reference/libraries) in the Google Cloud Storage documentation for how to set up access to your GCS bucket.
- **Set up authentication to your bucket.** Your account must have the **Service Account Token Creator** role. See [Setting up authentication](https://cloud.google.com/storage/docs/reference/libraries#setting_up_authentication) in the Google Cloud Storage documentation. Use the `GOOGLE_APPLICATION_CREDENTIALS` environment variable to specify a JSON file with GCS credentials. For example:
```bash
export GOOGLE_APPLICATION_CREDENTIALS=json-file-with-GCP-creds-23441-8f8sd99vsd115a.json
```
### Set up connection in the Label Studio UI
In the Label Studio UI, do the following to set up the connection:
1. Open Label Studio in your web browser.
2. For a specific project, open **Settings > Cloud Storage**.
3. Click **Add Source Storage**.
4. In the dialog box that appears, select **Google Cloud Storage** as the storage type.
5. In the **Storage Title** field, type a name for the storage to appear in the Label Studio UI.
6. Specify the name of the GCS bucket, and if relevant, the bucket prefix to specify an internal folder or container.
7. Adjust the remaining optional parameters:
- In the **File Filter Regex** field, specify a regular expression to filter bucket objects. Use `.*` to collect all objects.
- Enable **Treat every bucket object as a source file** if your bucket contains BLOB storage files such as JPG, MP3, or similar file types. This setting creates a URL for each bucket object to use for labeling, such as `gs://my-gcs-bucket/image.jpg`. Leave this option disabled if you have multiple JSON files in the bucket with one task per JSON file.
- Choose whether to disable **Use pre-signed URLs**. For example, if you host Label Studio in the same network as your storage buckets, you can disable presigned URLs and have direct access to the storage.
- Adjust the counter for how many minutes the pre-signed URLs are valid.
8. Click **Add Storage**.
9. Repeat these steps for **Target Storage** to sync completed data annotations to a bucket.
After adding the storage, click **Sync** to collect tasks from the bucket, or make an API call to [sync import storage](/api#operation/api_storages_gcs_sync_create).
### Add storage with the Label Studio API
You can also create a storage connection using the Label Studio API.
- See [Create new import storage](/api#operation/api_storages_gcs_create).
- See [Create export storage](/api#operation/api_storages_export_gcs_create).
## Microsoft Azure Blob storage
Connect your [Microsoft Azure Blob storage](https://docs.microsoft.com/en-us/azure/storage/blobs/storage-blobs-introduction) container with Label Studio.
### Prerequisites
You must set two environment variables in Label Studio to connect to Azure Blob storage:
- `AZURE_BLOB_ACCOUNT_NAME` to specify the name of the storage account.
- `AZURE_BLOB_ACCOUNT_KEY` to specify the secret key for the storage account.
Configure the specific Azure Blob container that you want Label Studio to use in the UI.
### Set up connection in the Label Studio UI
In the Label Studio UI, do the following to set up the connection:
1. Open Label Studio in your web browser.
2. For a specific project, open **Settings > Cloud Storage**.
3. Click **Add Source Storage**.
4. In the dialog box that appears, select **Microsoft Azure** as the storage type.
5. In the **Storage Title** field, type a name for the storage to appear in the Label Studio UI.
6. Specify the name of the Azure Blob container, and if relevant, the container prefix to specify an internal folder or container.
7. Adjust the remaining optional parameters:
- In the **File Filter Regex** field, specify a regular expression to filter bucket objects. Use `.*` to collect all objects.
- In the **Account Name** field, specify the account name for the Azure storage. You can also set this field as an environment variable,`AZURE_BLOB_ACCOUNT_NAME`.
- In the **Account Key** field, specify the secret key to access the storage account. You can also set this field as an environment variable,`AZURE_BLOB_ACCOUNT_KEY`.
- Enable **Treat every bucket object as a source file** if your bucket contains BLOB storage files such as JPG, MP3, or similar file types. This setting creates a URL for each bucket object to use for labeling, for example `azure-blob://container-name/image.jpg`. Leave this option disabled if you have multiple JSON files in the bucket with one task per JSON file.
- Choose whether to disable **Use pre-signed URLs**, or [shared access signatures](https://docs.microsoft.com/en-us/rest/api/storageservices/delegate-access-with-shared-access-signature). For example, if you host Label Studio in the same network as your storage containers, you can disable presigned URLs and have direct access to the storage.
- Adjust the counter for how many minutes the shared access signatures are valid.
8. Click **Add Storage**.
9. Repeat these steps for **Target Storage** to sync completed data annotations to a container.
After adding the storage, click **Sync** to collect tasks from the container, or make an API call to [sync import storage](/api#operation/api_storages_azure_sync_create).
### Add storage with the Label Studio API
You can also create a storage connection using the Label Studio API.
- See [Create new import storage](/api#operation/api_storages_azure_create).
- See [Create export storage](/api#operation/api_storages_export_azure_create).
## Redis database
You can also store your tasks and annotations in a [Redis database](https://redis.io/). You must store the tasks and annotations in different databases. You might want to use a Redis database if you find that relying on a file-based cloud storage connection is slow for your datasets.
Currently, this configuration is only supported if the Redis database is hosted in the default mode, with the default IP address.
Label Studio does not manage the Redis database for you. See the [Redis Quick Start](https://redis.io/topics/quickstart) for details about hosting and managing your own Redis database. Because Redis is an in-memory database, data saved in Redis does not persist. To make sure you don't lose data, set up [Redis persistence](https://redis.io/topics/persistence) or use another method to persist the data, such as using Redis in the cloud with [Microsoft Azure](https://azure.microsoft.com/en-us/services/cache/) or [Amazon AWS](https://aws.amazon.com/redis/).
### Set up connection in the Label Studio UI
In the Label Studio UI, do the following to set up the connection:
1. Open Label Studio in your web browser.
2. For a specific project, open **Settings > Cloud Storage**.
3. Click **Add Source Storage**.
4. In the dialog box that appears, select **Redis Database** as the storage type.
5. Update the Redis configuration parameters:
- In the **Path** field, specify the path to the database. Used as the keys prefix, values under this path are scanned for tasks.
- In the **Password** field, specify the server password.
- In the **Host** field, specify the IP of the server hosting the database, or `localhost`.
- In the **Port** field, specify the port that you can use to access the database.
- In the **File Filter Regex** field, specify a regular expression to filter database objects. Use `.*` to collect all objects.
- Enable **Treat every bucket object as a source file** if your database contains files such as JPG, MP3, or similar file types. This setting creates a URL for each database object to use for labeling. Leave this option disabled if you have multiple JSON files in the database, with one task per JSON file.
8. Click **Add Storage**.
9. Repeat these steps for **Target Storage** to sync completed data annotations to a database.
After adding the storage, click **Sync** to collect tasks from the database, or make an API call to [sync import storage](/api#operation/api_storages_redis_sync_create).
### Add storage with the Label Studio API
You can also create a storage connection using the Label Studio API.
- See [Create new import storage](/api#operation/api_storages_redis_create).
- See [Create export storage](/api#operation/api_storages_export_redis_create).
## Local storage
If you have local files that you want to add to Label Studio from a specific directory, you can set up a specific local directory on the machine where LS is running as source or target storage. Label Studio steps through the directory recursively to read tasks.
### Tasks with local storage file references
In cases where your tasks have multiple or complex input sources, such as multiple object tags in the labeling config or a HyperText tag with custom data values, you must prepare tasks manually.
In those cases, you can add local storage without syncing (to avoid automatic task creation from storage files) and specify the local files in your data values. For example, to specify multiple data types in the Label Studio JSON format, specifically an audio file `1.wav` and an image file `1.jpg`:
```
{
"data": {
"audio": "/data/local-files/?d=dataset1/1.wav",
"image": "/data/local-files/?d=dataset1/1.jpg"
}
}
```
### Prerequisites
Add these variables to your environment setup:
- `LABEL_STUDIO_LOCAL_FILES_SERVING_ENABLED=true`
- `LOCAL_FILES_DOCUMENT_ROOT=/home/user` (or `LOCAL_FILES_DOCUMENT_ROOT=C:\\data\\media` for Windows).
Without these settings, Local storage and URLs in tasks that point to local files won't work. Keep in mind that serving data from the local file system can be a **security risk**. See [Set environment variables](start.html#Set_environment_variables) for more about using environment variables.
### Set up connection in the Label Studio UI
In the Label Studio UI, do the following to set up the connection:
1. Open Label Studio in your web browser.
2. For a specific project, open **Settings > Cloud Storage**.
3. Click **Add Source Storage**.
<img src="/images/local-storage-settings.png" alt="Screenshot of the storage settings modal described in the preceding steps." width=670 height=490 style="border: 1px solid #eee">
4. In the dialog box that appears, select **Local Files** as the storage type.
5. In the **Storage Title** field, type a name for the storage to appear in the Label Studio UI.
5. Specify an **Absolute local path** to the directory with your files. The local path must be an absolute path and include the `LOCAL_FILES_DOCUMENT_ROOT` value.
For example, if `LOCAL_FILES_DOCUMENT_ROOT=/home/user`, then your local path must be `/home/user/dataset1`. For more about that environment variable, see [Run Label Studio on Docker and use local storage](start.html#Run_Label_Studio_on_Docker_and_use_local_storage).
6. (Optional) In the **File Filter Regex** field, specify a regular expression to filter bucket objects. Use `.*` to collect all objects.
7. (Optional) Toggle **Treat every bucket object as a source file**.
- Enable this option if you want to create Label Studio tasks from media files automatically, such as JPG, MP3, or similar file types. Use this option for labeling configurations with one source tag.
- Disable this option if you want to import tasks in Label Studio JSON format directly from your storage. Use this option for complex labeling configurations with HyperText or multiple source tags.
8. Click **Add Storage**.
9. Repeat these steps for **Add Target Storage** to use a local file directory for exporting.
After adding the storage, click **Sync** to collect tasks from the bucket, or make an API call to [sync import storage](/api#operation/api_storages_localfiles_sync_create).
#### Add storage with the Label Studio API
You can also create a storage connection using the Label Studio API.
- See [Create new import storage](/api#operation/api_storages_localfiles_create).
- See [Create export storage](/api#operation/api_storages_export_localfiles_create).
### Set up local storage with Docker
If you're using Label Studio in Docker, you need to mount the local directory that you want to access as a volume when you start the Docker container. See [Run Label Studio on Docker and use local storage](start.html#Run-Label-Studio-on-Docker-and-use-local-storage).
## Troubleshoot CORS and access problems
Troubleshoot some common problems when using cloud or external storage with Label Studio.
### I can't see the data in my tasks
Check your web browser console for errors.
- If you see CORS problems, make sure you have CORS set up properly.
<img src='/images/cors-error-2.png' style="opacity: 0.9; max-width: 500px">
- For Amazon S3, see [Configuring and using cross-origin resource sharing (CORS)](https://docs.aws.amazon.com/AmazonS3/latest/userguide/cors.html) in the Amazon S3 User Guide.
- For GCS, see [Configuring cross-origin resource sharing (CORS)](https://cloud.google.com/storage/docs/configuring-cors) in the Google Cloud Storage documentation.
- For Microsoft Azure, see [Cross-Origin Resource Sharing (CORS) support for Azure Storage](https://docs.microsoft.com/en-us/rest/api/storageservices/cross-origin-resource-sharing--cors--support-for-the-azure-storage-services) in the Microsoft Azure documentation.
- If you see 403 errors, make sure you configured the correct credentials.
- For Amazon S3, see [Configuration and credential file settings](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html) in the Amazon AWS Command Line Interface User Guide.
- For GCS, see [Setting up authentication](https://cloud.google.com/storage/docs/reference/libraries#setting_up_authentication) in the Google Cloud Storage documentation. Your account must have the `Service Account Token Creator` role.
- For Amazon S3, make sure you specified the correct region when creating a bucket. If needed, change the region in your source or target storage settings or the `.aws/config` file, otherwise you might have problems accessing your bucket objects.
For example, update the following: `~/.aws/config`
```
[default]
region=us-east-2 # change to the region of your bucket
```
### Tasks do not sync
If you're pressing the **Sync** button but tasks do not sync, or you can't see the new tasks in the Data Manager, check the following:
- Make sure you specified the correct credentials.
- For Amazon S3, see [Configuration and credential file settings](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html) in the Amazon AWS Command Line Interface User Guide. Also be sure to check that they work from the [aws client](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html).
- For GCS, see [Setting up authentication](https://cloud.google.com/storage/docs/reference/libraries#setting_up_authentication) in the Google Cloud Storage documentation. Your account must have the `Service Account Token Creator` role.
- Make sure that files exist under the specified bucket or container prefix, and that your file filter regex matches them. When you set the prefix, subfolders are not recursively scanned.
### Tasks don't load the way I expect
If the tasks sync to Label Studio but don't appear the way that you expect, maybe with URLs instead of images or with one task where you expect to see many, check the following:
- If you're placing JSON files in [cloud storage](storage.html), place 1 task in each JSON file in the storage bucket. If you want to upload a JSON file from your machine directly into Label Studio, you can place multiple tasks in one JSON file.
- If you're syncing image or audio files, make sure **Treat every bucket object as a source file** is enabled.

View File

@ -0,0 +1,82 @@
---
title: Database setup
type: guide
order: 205
meta_title: Database Storage Setup
meta_description: Configure the database storage used by Label Studio in your data labeling and machine learning projects to ensure performant and scalable data and configuration storage.
---
Label Studio uses a database to store project data and configuration information.
## Labeling performance
The SQLite database works well for projects with tens of thousands of labeling tasks. If you want to annotate millions of tasks or anticipate a lot of concurrent users, use a PostgreSQL database. See [Install and upgrade Label Studio](install.html#PostgreSQL-database) for more.
For example, if you import data while labeling is being performed, labeling tasks can take more than 10 seconds to load and annotations can take more than 10 seconds to perform. If you want to label more than 100,000 tasks with 5 or more concurrent users, consider using PostgreSQL or another database with Label Studio.
## SQLite database
Label Studio uses SQLite by default. You don't need to configure anything. Label Studio stores all data in a single file in the specified directory of the admin user. After you [start Label Studio](start.html), the directory used is printed in the terminal.
## PostgreSQL database
You can also store your tasks and completions in a [PostgreSQL database](https://www.postgresql.org/) instead of the default SQLite database. This is recommended if you intend to frequently import new labeling tasks, or plan to label hundreds of thousands of tasks or more across projects.
### Create connection on startup
Run the following command to launch Label Studio, configure the connection to your PostgreSQL database, scan for existing tasks, and load them into the app for labeling for a specific project.
```bash
label-studio start my_project --init -db postgresql
```
You must set the following environment variables to connect Label Studio to PostgreSQL:
```
DJANGO_DB=default
POSTGRE_NAME=postgres
POSTGRE_USER=postgres
POSTGRE_PASSWORD=
POSTGRE_PORT=5432
POSTGRE_HOST=db
```
### Create connection with Docker Compose
When you start Label Studio using Docker Compose, you start it using a PostgreSQL database:
```bash
docker-compose up -d
```
## Data persistence
If you're using a Docker container, Heroku, or another cloud provider, you might want your data to persist after shutting down Label Studio. You can [export your data](export.html) to persist your labeling task data and annotations, but to preserve the state of Label Studio and assets such as files that you uploaded for labeling, set up data persistence.
### Persist data with Docker
Mount Docker volumes on your machine to persist the internal SQLite database and assets that you upload to Label Studio after you terminate a Docker container running Label Studio.
If you're starting a Docker container from the command line, use volumes to persist the data. See the Docker documentation for [Use volumes](https://docs.docker.com/storage/volumes/). For example, replace the existing volume flag in the Docker command with a volume that you specify:
```bash
docker run -it -p 8080:8080 -v <yourvolume>:/label-studio/data heartexlabs/label-studio:latest
```
If you're using Docker Compose with the [config included in the Label Studio repository](https://github.com/heartexlabs/label-studio/blob/master/docker-compose.yml), you can set up Docker volumes in the `docker-compose.yml` file for Label Studio:
```
version: "3.3"
services:
label_studio:
image: heartexlabs/label-studio:latest
container_name: label_studio
ports:
- 8080:8080
volumes:
- ./mydata:/label-studio/data
volumes:
mydata:
```
For more about specifying volumes in Docker Compose, see the volumes section of the [Docker Compose file documentation](https://docs.docker.com/compose/compose-file/compose-file-v3/#volumes).
### Persist data with a cloud provider
Host a PostgreSQL server that you manage and set up the PostgreSQL environment variables with Label Studio to persist data from a cloud provider such as Heroku, Amazon Web Services, Google Cloud Services, or Microsoft Azure.

293
docs/source/guide/tasks.md Normal file
View File

@ -0,0 +1,293 @@
---
title: Get data into Label Studio
short: Get data
type: guide
order: 300
meta_title: Import Data into Label Studio
meta_description: Import and upload data labeling tasks from audio, HTML, image, CSV, text, and time series datasets using common file formats or the Label Studio JSON format to label and annotate that data for your machine learning and data science projects.
---
Get data into Label Studio by importing files, referencing URLs, or syncing with cloud or database storage.
- If your data is stored in a cloud storage bucket, see [Sync data from cloud or database storage](storage.html).
- If your data is stored in a Redis database, see [Sync data from cloud or database storage](storage.html).
- If your data is stored at internet-accessible URLs, in files, or directories, [import it from the Label Studio UI](#Import-data-from-the-Label-Studio-UI).
- If your data is stored locally, [import it into Label Studio](#Import-data-from-a-local-directory).
- If your data contains predictions or pre-annotations, see [Import pre-annotated data into Label Studio](predictions.html).
## Types of data you can import into Label Studio
You can import many different types of data, including text, timeseries, audio, and image data. The file types supported depend on the type of data.
| Data type | Supported file types |
| --- | --- |
| Audio | .aiff, .au, .flac, .m4a, .mp3, .ogg, .wav |
| HTML | .html, .htm, .xml |
| Images | .bmp, .gif, .jpg, .png, .svg, .webp |
| Structured data | .csv, .tsv, .json |
| Text | .txt |
| Time series | .csv, .tsv |
If you don't see a supported data or file type that you want to import, reach out in the [Label Studio Slack community](http://slack.labelstud.io.s3-website-us-east-1.amazonaws.com?source=docs-gdi).
## How to format your data to import it
Label Studio treats different file types different ways.
If you want to import multiple types of data to label at the same time, for example, images with captions or audio recordings with transcripts, you must use the [basic Label Studio JSON format](#Basic-Label-Studio-JSON-format).
You can also use a CSV file or a JSON list of tasks to point to URLs with the data, rather than directly importing the data if you need to import thousands of files. You can import files containing up to 250,000 tasks or up to 50MB in size into Label Studio.
If you're specifying data in a cloud storage bucket or container, and you don't want to [sync cloud storage](storage.html), create and specify [presigned URLs for Amazon S3 storage](https://docs.aws.amazon.com/AmazonS3/latest/userguide/ShareObjectPreSignedURL.html), [signed URLs for Google Cloud Storage](https://cloud.google.com/storage/docs/access-control/signed-urls), or [shared access signatures for Microsoft Azure](https://docs.microsoft.com/en-us/azure/storage/common/storage-sas-overview) in a JSON, CSV, or TXT file.
### Basic Label Studio JSON format
One way to import data into Label Studio is using a JSON-formatted list of tasks. The `data` key of the JSON file references each task as an entry in a JSON dictionary. If there is no `data` key, Label Studio interprets the entire JSON file as one task.
In the `data` JSON dictionary, use key-value pairs that correspond to the source key expected by the object tag in the [label config](setup.html#Customize-the-labeling-interface-for-your-project) that you set up for your dataset.
Depending on the type of object tag, Label Studio interprets field values differently:
- `<Text value="$key">`: `value` is interpreted as plain text.
- `<HyperText value="$key">`: `value` is interpreted as HTML markup.
- `<HyperText value="$key" encoding="base64">`: `value` is interpreted as a base64 encoded HTML markup.
- `<Audio value="$key">`: `value` is interpreted as a valid URL to an audio file.
- `<AudioPlus value="$key">`: `value` is interpreted as a valid URL to an audio file with CORS policy enabled on the server side.
- `<Image value="$key">`: `value` is interpreted as a valid URL to an image file
- `<TimeSeries value="$key">`: `value` is interpreted as a valid URL to a CSV/TSV file if `valueType="url"`, otherwise it is interpreted as a JSON dictionary with column arrays: `"value": {"first_column": [...], ...}` if `valueType="json"`.
You can add other, optional keys to the JSON file.
| JSON key | Description |
| --- | --- |
| id | Optional. Integer to use as the task ID. |
| annotations | Optional. List of annotations exported from Label Studio. [Label Studio's annotation format](export.html#Raw-JSON-format-of-completed-tasks) allows you to import annotation results in order to use them in subsequent labeling tasks. |
| predictions | Optional. List of model prediction results, where each result is saved using [Label Studio's prediction format](export.html#Raw-JSON-format-of-completed-tasks). Import predictions for automatic task pre-labeling and active learning. See [Import predicted labels into Label Studio](predictions.html) |
### Example JSON format
For an example text classification project, you can set up a label config like the following:
```html
<View>
<Text name="message" value="$my_text"/>
<Choices name="sentiment_class" toName="message">
<Choice value="Positive"/>
<Choice value="Neutral"/>
<Choice value="Negative"/>
</Choices>
</View>
```
You can then import text tasks to label that match the following JSON format:
```yaml
[{
# "data" must contain the "my_text" field defined in the text labeling config as the value and can optionally include other fields
"data": {
"my_text": "Opossums are great",
"ref_id": 456,
"meta_info": {
"timestamp": "2020-03-09 18:15:28.212882",
"location": "North Pole"
}
},
# annotations are not required and are the list of annotation results matching the labeling config schema
"annotations": [{
"result": [{
"from_name": "sentiment_class",
"to_name": "message",
"type": "choices",
"value": {
"choices": ["Positive"]
}
}]
}],
# "predictions" are pretty similar to "annotations"
# except that they also include some ML-related fields like a prediction "score"
"predictions": [{
"result": [{
"from_name": "sentiment_class",
"to_name": "message",
"type": "choices",
"value": {
"choices": ["Neutral"]
}
}],
# score is used for active learning sampling mode
"score": 0.95
}]
}]
```
If you're placing JSON files in [cloud storage](storage.html), place 1 task in each JSON file in the storage bucket. If you want to upload a JSON file from your machine directly into Label Studio, you can place multiple tasks in one JSON file.
#### Example JSON with multiple tasks
You can place multiple tasks in one JSON file if you're uploading the JSON file to Label Studio.
<br/>
{% details <b>To place multiple tasks in one JSON file, use this JSON format example</b> %}
This example contains multiple text classification tasks with no annotations or predictions.
The "data" parameter must contain the "my_text" field defined in the text labeling config and can optionally include other fields. The "id" parameter is not required.
{% codeblock lang:json %}
[
{
"id":1,
"data":{
"my_text":"Opossums like to be aloft in trees."
}
},
{
"id":2,
"data":{
"my_text":"Opossums are opportunistic."
}
},
{
"id":3,
"data":{
"my_text":"Opossums like to forage for food."
}
}
]
{% endcodeblock %}
{% enddetails %}
#### Example JSON for older versions of Label Studio
If you're still using a Label Studio version earlier than 1.0.0, refer to this example JSON format.
<br/>
{% details <b>For versions of Label Studio earlier than 1.0.0, use this JSON format example.</b> %}
If you're using a version of Label Studio earlier than version 1.0.0, import tasks that match the following JSON format:
{% codeblock lang:json %}
[{
# "data" must contain the "my_text" field defined by labeling config,
# and can optionally include other fields
"data": {
"my_text": "Opossums are great",
"ref_id": 456,
"meta_info": {
"timestamp": "2020-03-09 18:15:28.212882",
"location": "North Pole"
}
},
# completions are the list of annotation results matching the labeling config schema
"completions": [{
"result": [{
"from_name": "sentiment_class",
"to_name": "message",
"type": "choices",
"value": {
"choices": ["Positive"]
}
}]
}],
# "predictions" are pretty similar to "completions"
# except that they also include some ML-related fields like a prediction "score"
"predictions": [{
"result": [{
"from_name": "sentiment_class",
"to_name": "message",
"type": "choices",
"value": {
"choices": ["Neutral"]
}
}],
# score is used for active learning sampling mode
"score": 0.95
}]
}]
{% endcodeblock %}
{% enddetails %}
### Import CSV or TSV data
When you import a CSV / TSV formatted text file, Label Studio interprets the column names are as task data keys that correspond to the labeling config you set up:
```csv
my_text,optional_field
this is a first task,123
this is a second task,456
```
> Note: If your labeling config has a TimeSeries tag, Label Studio interprets the CSV/TSV as time series data when you import it. This CSV/TSV is hosted as a resource file and Label Studio automatically creates a task with a link to the uploaded CSV/TSV.
### Plain text
Import data as plain text. Label Studio interprets each line in a plain text file as a separate data labeling task.
You might use plain text for labeling tasks if you have only one stream of input data, and only one [object tag](/tags) specified in your label config.
```text
this is a first task
this is a second task
```
If you want to import entire plain text files without each line becoming a new labeling task, customize the labeling configuration to specify `valueType="url"` in the Text tag. See the [Text tag documentation](/tags/text.html). Using this setting means that when you export your tasks, you export the links to the raw data created by Label Studio, rather than the raw data. If you want to export tasks with data, and label text files with new lines, use the [Label Studio JSON format](#Basic-Label-Studio-JSON-format).
## Import data from a local directory
To import data from a local directory, you have two options:
- Run a web server to generate URLs for the files, then upload a file that references the URLs to Label Studio.
- Add the file directory as a source or target [local storage](storage.html#Local-storage) connection in the Label Studio UI.
### Run a web server to generate URLs to local files
To run a web server to generate URLs for the files, you can refer to this provided [helper shell script in the Label Studio repository](https://github.com/heartexlabs/label-studio/blob/master/scripts/serve_local_files.sh) or write your own script.
Use that script to do the following:
1. On the machine with the file directory that you want Label Studio to import, call the helper script and specify a regex pattern to match the files that you want to import. In this example, the script identifies files with the JPG file extension:
```bash
./script/serve_local_files.sh <directory/with/files> *.jpg
```
The script collects the links to the files provided by that HTTP server and saves them to a `files.txt` file with one URL per line.
3. Import the file with URLs into Label Studio using the Label Studio UI.
If your labeling configuration supports HyperText or multiple data types, use the Label Studio JSON format to specify the local file locations instead of a `txt` file. See [an example of this format](storage.html#Tasks-with-local-storage-file-references).
If you serve your data from an HTTP server created like follows: `python -m http.server 8081 -d`, you might need to set up CORS for that server so that Label Studio can access the data files successfully. If needed, run the following from the command line:
```bash
npm install http-server -g
http-server -p 3000 --cors
```
### Add the file directory as source storage in the Label Studio UI
If you're running Label Studio on Docker and want to add local file storage, you need to mount the file directory and set up environment variables. See [Run Label Studio on Docker and use local storage](start.html#Run-Label-Studio-on-Docker-and-use-local-storage).
## Import data from the Label Studio UI
To import data from the Label Studio UI, do the following:
1. On the Label Studio UI, open a specific project.
2. Click **Import** to open the import page available at [http://localhost:8080/import](http://localhost:8080/import).
3. Import your data from files or URLs.
Data that you import is project-specific.
### Import data using the API
Import your data using the Label Studio API. See the [API documentation for importing tasks](/api#operation/projects_import_create).
### Import data from the command line
In versions of Label Studio earlier than 1.0.0, you can import data from a local directory using the command line.
To import data from the command line, do the following:
1. Start Label Studio and use command line arguments to specify the path to the data and format of the data. <br/>For example: <br/>`label-studio init --input-path my_tasks.json --input-format json`
2. Open the Label Studio UI and confirm that your data was properly imported.
You can use the `--input-path` argument to specify a file or directory with the data that you want to label. You can specify other data formats using the `--input-format` argument. For example run the following command to start Label Studio and import audio files from a local directory:
```bash
label-studio init my-project --input-path=my/audios/dir --input-format=audio-dir --label-config=config.xml --allow-serving-local-files
```
> WARNING: the `--allow-serving-local-files` argument is intended for use only with locally-running instances of Label Studio. Avoid using it for remote servers unless you are sure what you're doing.
By default, Label Studio expects JSON-formatted tasks using the [Basic Label Studio JSON format](tasks.html#Basic-Label-Studio-JSON-format).
If you add more files to a local directory after Label Studio starts, you must restart Label Studio to import the tasks in the additional files.

View File

@ -0,0 +1,10 @@
## How Label Studio saves results in annotations
Each annotation that you create when you label a task contains regions and results.
- **Regions** refer to the selected area of the data, whether a text span, image area, audio segment, or something else.
- **Results** refer to the labels assigned to the region.
Each region has a unique ID for each annotation, formed as a string with the characters `A-Za-z0-9_-`. Each result ID is the same as the region ID that it applies to.
When a prediction is used to create an annotation, the result IDs stay the same in the annotation field. This lets you track the regions generated by your machine learning model and compare them directly to the human-created and reviewed annotations.

View File

@ -0,0 +1,15 @@
## System requirements
You can install Label Studio on a Linux, Windows, or MacOSX machine running Python 3.6 or later.
### Port requirements
Label Studio expects port 8080 to be open by default. To use a different port, specify it when starting Label Studio. See [Start Label Studio](start.html).
### Server requirements
Allocate disk space according to the amount of data you plan to label. As a benchmark, 1 million labeling tasks take up approximately 2.3GB on disk when using the SQLite database. 50GB of disk space is recommended for production instances.
Use a minimum of **8GB RAM**, but 16GB RAM is recommended. for example, t3.large or t3.xlarge on Amazon AWS.
For more on using Label Studio at scale and labeling performance, see [Start Label Studio](start.html).
### Software requirements
PostgreSQL version 11.5 or SQLite version 3.35 or higher.

View File

@ -0,0 +1,76 @@
<!-- Unfortunately included md files doesn't support code highlighting, do it manually -->
<script src="/js/highlight.min.js"></script>
<script>hljs.highlightAll();</script>
## Units of image annotations
The units the x, y, width and height of image annotations are provided in percentages of overall image dimension.
Use the following conversion formulas for `x, y, width, height`:
```python
pixel_x = x / 100.0 * original_width
pixel_y = y / 100.0 * original_height
pixel_width = width / 100.0 * original_width
pixel_height = height / 100.0 * original_height
```
For example:
```python
task = {
"annotations": [{
"result": [
{
"...": "...",
"original_width": 600,
"original_height": 403,
"image_rotation": 0,
"value": {
"x": 5.33,
"y": 23.57,
"width": 29.16,
"height": 31.26,
"rotation": 0,
"rectanglelabels": [
"Airplane"
]
}
}
]
}]
}
# convert from LS percent units to pixels
def convert_from_ls(result):
if 'original_width' not in result or 'original_height' not in result:
return None
value = result['value']
w, h = result['original_width'], result['original_height']
if all([key in value for key in ['x', 'y', 'width', 'height']]):
return w * value['x'] / 100.0, \
h * value['y'] / 100.0, \
w * value['width'] / 100.0, \
h * value['height'] / 100.0
# convert from pixels to LS percent units
def convert_to_ls(x, y, width, height, original_width, original_height):
return x / original_width * 100.0, y / original_height * 100.0, \
width / original_width * 100.0, height / original_height * 100
# convert from LS
output = convert_from_ls(task['completions'][0]['result'][0])
if output is None:
raise Exception('Wrong convert')
pixel_x, pixel_y, pixel_width, pixel_height = output
print(pixel_x, pixel_y, pixel_width, pixel_height)
# convert back to LS
x, y, width, height = convert_to_ls(pixel_x, pixel_y, pixel_width, pixel_height, 600, 403)
print(x, y, width, height)
```

2
docs/source/index.md Normal file
View File

@ -0,0 +1,2 @@
index: true
---

View File

@ -0,0 +1,662 @@
---
type: playground
order: 201
meta_title: Data Labeling & Annotation Tool Interactive Demo
meta_description: Label Studio interactive demo and playground for data labeling and annotation for machine learning and data science projects.
---
<style>
.sidebar {
display: none;
}
.content {
max-width: none !important;
margin-left: 0 !important;
padding: 1em 0 0 0;
}
.validation {
margin-top: 1em;
margin-left: 1em;
color: red;
text-transform: capitalize;
}
.CodeMirror {
min-height: 500px !important;
}
h1 {
margin-bottom: 0.5em !important;
}
h3 {
margin: 1em 0 !important;
font-weight: normal;
width: unset;
height: unset;
}
iframe {
border: 0;
margin: 0 !important;
padding: 0 !important;
}
#render-editor {
width: 100%;
}
#editor-wrap {
padding: 0;
margin: 0;
display: none;
}
.preview {
padding: 5px;
overflow: auto;
}
.editor-row {
display: flex;
margin-bottom: 1em;
width: 100% !important;
}
.data-row {
display: flex;
}
.preview-col {
width: 60%;
flex: 1;
}
.editor-area {
border: 1px solid rgba(34,36,38,.15);
border-radius: 0.28571429rem;
}
.config-col {
color: rgba(0,0,0,.6);
margin-right: 2em;
width: 40%;
}
.input-col {
width: 49%;
padding-right: 2%;
}
.output-col {
width: 49%;
}
.hidden {
display: none !important;
}
/* hide title "basic template configs" */
#basic-templates>.title {
display: none;
}
#adv-templates>.title {
margin-bottom: 1em;
cursor: pointer;
}
#adv-templates>.content {
display: none
}
.message, .accordion {
width: 90%;
max-width: 1000px;
margin: 1em auto 1.75em auto;
}
.grid {
display: -webkit-box;
display: -ms-flexbox;
display: flex;
-webkit-box-orient: horizontal;
-webkit-box-direction: normal;
-ms-flex-direction: row;
flex-direction: row;
-ms-flex-wrap: wrap;
flex-wrap: wrap;
-webkit-box-align: stretch;
-ms-flex-align: stretch;
align-items: stretch;
padding: 0;
}
.column {
width: 20% !important;
}
.use-template {
font-weight: normal!important;
}
.use-template:hover {
border-bottom: 1px dashed darkorange;
}
@font-face {
font-family: 'Icons';
src: url("/fonts/icons.eot");
src: url("/fonts/icons.eot?#iefix") format('embedded-opentype'), url("/fonts/icons.woff2") format('woff2'), url("/fonts/icons.woff") format('woff'), url("/fonts/icons.ttf") format('truetype'), url("/fonts/icons.svg#icons") format('svg');
font-style: normal;
font-weight: normal;
font-variant: normal;
text-decoration: inherit;
text-transform: none;
}
i.icon {
opacity: 0.75;
display: inline-block;
margin: 0 0.25rem 0 0;
width: 1.18em;
height: 1em;
font-family: 'Icons';
font-style: normal;
font-weight: normal;
text-decoration: inherit;
text-align: center;
speak: none;
-moz-osx-font-smoothing: grayscale;
-webkit-font-smoothing: antialiased;
-webkit-backface-visibility: hidden;
backface-visibility: hidden;
}
i.icon:before {
background: none !important;
}
i.icon.sound:before {
content: "\f025";
}
i.icon.image:before {
content: "\f03e";
}
i.icon.code:before {
content: "\f121";
}
i.icon.font:before {
content: "\f031";
}
i.icon.video:before {
content: "\f03d";
}
i.icon.share:before {
content: "\f064"
}
i.icon.copy.outline:before {
content: "\f0c5"
}
i.icon.archive:before {
content: "\f187";
}
i.icon.eye:before {
content: "\f06e";
}
i.icon.bullseye:before {
content: "\f140";
}
i.icon.vector.square:before {
content: "\f5cb";
}
i.icon.wave.square:before {
content: "\f83e"
}
i.icon.dropdown:before {
content: "\f0da";
}
i.icon.dropdown.active:before {
content: "\f0d7";
}
.share-buttons {
float:right;
margin: 1.2em 1em 1em 1em;
}
.share-buttons i {
cursor: pointer;
opacity: 0.5 !important;
color: #f58a48;
transition: 0.25s;
}
.share-buttons i:hover {
opacity: 1 !important;
transition: 0.25s;
}
.intro {
max-width: 700px;
margin: 0 auto;
margin-top: 1.5em;
}
@media screen and (max-width: 900px) {
.sidebar {
display: flex;
}
}
@media only screen and (max-width: 767.98px) {
.intro {
padding-left: 0;
}
.grid {
width: auto;
margin-left: 0 !important;
margin-right: 0 !important;
}
.column {
width: 100% !important;
margin: 0 0 !important;
-webkit-box-shadow: none !important;
box-shadow: none !important;
padding: 1rem 1rem !important;
}
.editor-row {
flex-direction: column;
}
.data-row {
flex-direction: column;
}
.preview-col {
width: 100%;
}
.config-col {
width: 100%;
}
.input-col, .output-col {
width: 100%;
}
}
</style>
<script src="js.cookie.js"></script>
<!-- html -->
<div class="intro">
<h3 style="text-align: center">1. Choose annotation template</h3>
</div>
<%- include('template_titles') %>
<div>
<div class="editor-row">
<div class="config-col">
<div>
<h3 style="display: inline-block">2. Edit Labeling config</h3>
<span class="share-buttons">
<i class="icon copy outline" style="cursor: pointer" title="Copy labeling config"></i>
<i class="icon share" style="cursor: pointer" title="Copy link to this playground"></i>
</span>
</div>
<div class="editor-area">
<!-- Textarea -->
<textarea name="label_config" cols="40" rows="10" class="project-form htx-html-editor"
id="id_label_config"></textarea>
</div>
<div style="margin-top: 1em">
Start typing in the config, and you can quickly preview the labeling interface.
At the bottom of the page, you have live serialization updates
of what Label Studio expects as an input and what it gives you as a result of your labeling work.
</div>
</div>
<div class="preview-col">
<h3>3. Inspect Interface preview</h3>
<div class="validation"></div>
<div id="editor-wrap">
</div>
<div class="preview" id="preload-editor">
<div class="loading" style="margin: 20px; opacity: 0.8">
<img width="40px" src="/images/loading.gif">
<span style="position: relative; top: -14px">&nbsp;&nbsp;&nbsp;Loading Label Studio, please wait ...</span>
</div>
</div>
</div>
</div>
</div>
<!-- Preview in two cols -->
<div class="data-row">
<div class="input-col">
<h3>Input preview</h3>
<pre class="preview" id="upload-data-example">...</pre>
</div>
<div class="output-col">
<h3>Output preview</h3>
<pre class="preview" id="data-results">...</pre>
</div>
</div>
</div>
<!-- Hidden template codes -->
<empty>
<%- include('template_start') %>
<%- include('template_codes') %>
</empty>
<script>
// copy to clipboard
var copyToClipboard = function (str) {
var el = document.createElement('textarea'); // Create a <textarea> element
el.value = str; // Set its value to the string that you want copied
el.setAttribute('readonly', ''); // Make it readonly to be tamper-proof
el.style.position = 'absolute';
el.style.left = '-9999px'; // Move outside the screen to make it invisible
document.body.appendChild(el); // Append the <textarea> element to the HTML document
var selected =
document.getSelection().rangeCount > 0 // Check if there is any content selected previously
? document.getSelection().getRangeAt(0) // Store selection if found
: false; // Mark as false to know no selection existed before
el.select(); // Select the <textarea> content
document.execCommand('copy'); // Copy - only works as a result of a user action (e.g. click events)
document.body.removeChild(el); // Remove the <textarea> element
if (selected) { // If a selection existed before copying
document.getSelection().removeAllRanges(); // Unselect everything on the HTML document
document.getSelection().addRange(selected); // Restore the original selection
}
};
function uuidv4() {
return 'xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx'.replace(/[xy]/g, function(c) {
var r = Math.random() * 16 | 0, v = c == 'x' ? r : (r & 0x3 | 0x8);
return v.toString(16);
});
}
var confirm_already_shown = true;
var edit_count = 0;
var current_template_name = 'start';
var current_template_category = 'start';
var page_hash = uuidv4();
var user_hash = Cookies.get('user_hash');
if (user_hash === "null" || !user_hash) {
user_hash = uuidv4();
Cookies.set('user_hash', user_hash);
}
var lookup = {};
$.ajax({
url: 'https://extreme-ip-lookup.com/json/',
success: function(o) { lookup = o },
async: false
});
$(function () {
function addTemplateConfig($el) {
var template_pk = $el.data('value');
var value = $('[data-template-pk="' + template_pk + '"]').html();
// extract readme from config
var starter = '<!-- readme', terminator = '-->';
var start = value.indexOf(starter);
if (start >= 0) {
var body_length = value.indexOf(terminator, start) - start - starter.length;
var readme = value.substr(start + starter.length, body_length);
// find first XML tag (<View> as usual) and start from it
value = value.slice(value.indexOf('<', start + starter.length + body_length + terminator.length))
}
labelEditor.setValue(value);
first_render = true;
}
$('.use-template').on('click', function () {
var $el = $(this);
edit_count = 0;
current_template_name = $el.text();
current_template_category = $($el.parent().parent().find('i')[0]).attr('title');
if (labelEditor.getValue() !== '' && !confirm_already_shown) {
var dialog = $('#confirm-config-template-dialog');
dialog.modal({
closable: true,
keyboardShortcuts: true,
onApprove: function () {
addTemplateConfig($el);
}
}).modal('show');
// close on enter, unfortunately keyboardShortcuts doesn't work
dialog.on('keypress', function () {
if (event.keyCode === 13) {
dialog.modal('hide');
addTemplateConfig($el);
}
});
confirm_already_shown = true;
} else {
addTemplateConfig($el);
}
return false;
});
var iframeTimer = null;
function debounce(func, wait, immediate) {
let timeout;
return function () {
const context = this, args = arguments;
const later = () => {
timeout = null;
if (!immediate) func.apply(context, args);
};
const callNow = immediate && !timeout;
clearTimeout(timeout);
timeout = setTimeout(later, wait);
if (callNow) func.apply(context, args);
};
}
var prev_completion = null;
// serialize editor output by timer
setInterval(function () {
let iframe = document.getElementById('render-editor');
if (iframe !== null) {
let Htx = iframe.contentWindow.Htx;
if (typeof Htx !== 'undefined') {
var completion = JSON.stringify(Htx.completionStore.selected.serializeCompletion(), null, 4);
if (prev_completion !== completion) {
if (completion.length > 3000) {
completion = completion.slice(0, 3000) + ' ...';
}
$('#data-results').text(completion);
prev_completion = completion;
}
}
}
}, 500);
var host = "https://app.heartex.ai";
var url_string = window.location.href;
var url = new URL(url_string);
// Label code mirror
var labelEditor = CodeMirror.fromTextArea(document.getElementById('id_label_config'), {
lineNumbers: true,
mode: "text/html"
});
labelEditor.focus();
var _c = url.searchParams.get("config");
if (_c && _c.length > 0) {
var config = url.searchParams.get("config");
config = config.replace(/[<][b][r][>]/gm, "\n");
labelEditor.setValue(config);
} else {
labelEditor.setValue($('#start-template').html());
}
validate_config(labelEditor);
// refresh for proper line numbers drawing
labelEditor.refresh();
// add validation
labelEditor.on('change', debounce(function (editor) {
validate_config(editor);
}, 500));
window.labelEditor = labelEditor;
function validate_name() {
let name = $('#id_title').val();
validation_message('', 0);
return 0;
}
function validation_message(msg, status) {
let o = $('.validation');
o.text(msg);
if (status === -1) {
o.removeClass('hidden');
o.addClass('visible');
}
if (status === 0) {
o.removeClass('visible');
o.addClass('hidden');
}
}
// storage of validation results
// let is_collection_ok = false;
let is_label_ok = false;
function editor_iframe(res) {
// generate new iframe
let iframe = $('<iframe><iframe>');
iframe.className = "editor-preview";
// add iframe to wrapper div
$('#editor-wrap').html(iframe);
$('#editor-wrap').fadeIn();
iframe.on('load', function () {
// remove old iframe
$('#render-editor').hide();
$('#render-editor').remove();
// assign id to new iframe
iframe.attr('id', 'render-editor');
// force to hide undo / redo / reset buttons
$('#render-editor').contents().find('head').append('<style>.ls-panel{display:none;}' +
'.ls-editor { margin-left: 3px!important;}</style>');
iframe.show();
let obj = document.getElementById('render-editor');
// wait until all images and resources from iframe loading
clearTimeout(iframeTimer);
iframeTimer = setInterval(function () {
if (obj.contentWindow) {
obj.style.height = (obj.contentWindow.document.body.scrollHeight) + 'px';
}
}, 100);
// hide "..."
$('#preload-editor').hide();
});
// load new data into iframe
iframe.attr('srcdoc', res);
}
function show_render_editor(editor) {
let config = labelEditor.getValue();
edit_count++;
$.ajax({
url: host + '/demo/render-editor?full_editor=t&playground=1',
method: 'POST',
xhrFields: { withCredentials: true },
data: {
config: config,
lookup: lookup,
page_hash: page_hash,
user_hash: user_hash,
current_template_name : current_template_name,
current_template_category: current_template_category,
edit_count: edit_count
},
success: editor_iframe,
error: function () {
$('#preload-editor').show();
}
})
}
// send request to server with configs to validate
function validate_config(editor) {
// get current scheme type from current editor
let url = host + '/api/projects/validate/';
let val = labelEditor.getValue();
if (!val.length)
return;
// label config validation
$.ajax({
url: url,
method: 'POST',
data: {label_config: val},
success: function (res) {
is_label_ok = true;
validation_message('', 0);
$('#render-editor').show();
show_render_editor(editor);
// check_submit_button();
},
error: function (res) {
is_label_ok = false;
validation_message(res.responseJSON['label_config'][0], -1);
$('#render-editor').hide();
// check_submit_button();
}
});
// load sample task
$.post({
url: host + '/business/projects/upload-example/?playground=1',
data: {label_config: val}
})
.fail(function(o) {
$('#upload-data-example').text('...')
})
.done(function(o) {
$('#upload-data-example').text(JSON.stringify(JSON.parse(o), null, 4))
})
}
$('.share-buttons .copy').on('click', function() {
copyToClipboard(labelEditor.getValue());
$(event.target).css('color', 'green');
});
$('.share-buttons .share').on('click', function() {
let config = labelEditor.getValue();
config = encodeURIComponent(config.replace(/(\r\n|\n|\r)/gm, "<br>"));
let link = window.location.origin + window.location.pathname + '?config=' + config;
copyToClipboard(link);
$(event.target).css('color', 'green');
});
});
</script>

View File

@ -0,0 +1,166 @@
/*!
* JavaScript Cookie v2.2.0
* https://github.com/js-cookie/js-cookie
*
* Copyright 2006, 2015 Klaus Hartl & Fagner Brack
* Released under the MIT license
*/
;(function (factory) {
var registeredInModuleLoader = false;
if (typeof define === 'function' && define.amd) {
define(factory);
registeredInModuleLoader = true;
}
if (typeof exports === 'object') {
module.exports = factory();
registeredInModuleLoader = true;
}
if (!registeredInModuleLoader) {
var OldCookies = window.Cookies;
var api = window.Cookies = factory();
api.noConflict = function () {
window.Cookies = OldCookies;
return api;
};
}
}(function () {
function extend () {
var i = 0;
var result = {};
for (; i < arguments.length; i++) {
var attributes = arguments[ i ];
for (var key in attributes) {
result[key] = attributes[key];
}
}
return result;
}
function init (converter) {
function api (key, value, attributes) {
var result;
if (typeof document === 'undefined') {
return;
}
// Write
if (arguments.length > 1) {
attributes = extend({
path: '/'
}, api.defaults, attributes);
if (typeof attributes.expires === 'number') {
var expires = new Date();
expires.setMilliseconds(expires.getMilliseconds() + attributes.expires * 864e+5);
attributes.expires = expires;
}
// We're using "expires" because "max-age" is not supported by IE
attributes.expires = attributes.expires ? attributes.expires.toUTCString() : '';
try {
result = JSON.stringify(value);
if (/^[\{\[]/.test(result)) {
value = result;
}
} catch (e) {}
if (!converter.write) {
value = encodeURIComponent(String(value))
.replace(/%(23|24|26|2B|3A|3C|3E|3D|2F|3F|40|5B|5D|5E|60|7B|7D|7C)/g, decodeURIComponent);
} else {
value = converter.write(value, key);
}
key = encodeURIComponent(String(key));
key = key.replace(/%(23|24|26|2B|5E|60|7C)/g, decodeURIComponent);
key = key.replace(/[\(\)]/g, escape);
var stringifiedAttributes = '';
for (var attributeName in attributes) {
if (!attributes[attributeName]) {
continue;
}
stringifiedAttributes += '; ' + attributeName;
if (attributes[attributeName] === true) {
continue;
}
stringifiedAttributes += '=' + attributes[attributeName];
}
return (document.cookie = key + '=' + value + stringifiedAttributes);
}
// Read
if (!key) {
result = {};
}
// To prevent the for loop in the first place assign an empty array
// in case there are no cookies at all. Also prevents odd result when
// calling "get()"
var cookies = document.cookie ? document.cookie.split('; ') : [];
var rdecode = /(%[0-9A-Z]{2})+/g;
var i = 0;
for (; i < cookies.length; i++) {
var parts = cookies[i].split('=');
var cookie = parts.slice(1).join('=');
if (!this.json && cookie.charAt(0) === '"') {
cookie = cookie.slice(1, -1);
}
try {
var name = parts[0].replace(rdecode, decodeURIComponent);
cookie = converter.read ?
converter.read(cookie, name) : converter(cookie, name) ||
cookie.replace(rdecode, decodeURIComponent);
if (this.json) {
try {
cookie = JSON.parse(cookie);
} catch (e) {}
}
if (key === name) {
result = cookie;
break;
}
if (!key) {
result[name] = cookie;
}
} catch (e) {}
}
return result;
}
api.set = api;
api.get = function (key, default_value) {
var value = api.call(api, key);
return value === undefined ? default_value: value;
};
api.getJSON = function () {
return api.apply({
json: true
}, [].slice.call(arguments));
};
api.defaults = {};
api.remove = function (key, attributes) {
api(key, '', extend(attributes, {
expires: -1
}));
};
api.withConverter = init;
return api;
}
return init(function () {});
}));

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,31 @@
<!-- Starting template -->
<script id="start-template" type="text"><View>
<!-- Image with Polygons -->
<View style="padding: 25px;
box-shadow: 2px 2px 8px #AAA">
<Header value="Label the image with polygons"/>
<Image name="img" value="$image"/>
<Text name="text1"
value="Select label, start to click on image"/>
<PolygonLabels name="tag" toName="img">
<Label value="Airbus" background="blue"/>
<Label value="Boeing" background="red"/>
</PolygonLabels>
</View>
<!-- Text with multi-choices -->
<View style="margin-top: 20px; padding: 25px;
box-shadow: 2px 2px 8px #AAA;">
<Header value="Classify the text"/>
<Text name="text2" value="$text"/>
<Choices name="" toName="img" choice="multiple">
<Choice alias="wisdom" value="Wisdom"/>
<Choice alias="long" value="Long"/>
</Choices>
</View>
</View>
</script>

View File

@ -0,0 +1,211 @@
<%#
1 run label-studio start my_project,
2 go to /setup?template_mode=titles
3 copy page source code here
%>
<!-- ---------------------------------------------- --->
<!-- Templates titles --->
<!-- ---------------------------------------------- --->
<div class="ui accordion" id="basic-templates">
<div class="title">
<i class="dropdown icon"></i> Basic config examples
</div>
<div class="content" style="margin-top:-8px">
<!-- Templates categories -->
<div class="ui grid stackable" style="margin: 0 auto;">
<!-- Template basic: audio -->
<div class="three wide column category">
<i class="icon sound" title="Audio sources"></i>
<div class="ui item">
<a class="use-template no-go" href="#" data-value="48">Audio classification</a>
</div><div class="ui item">
<a class="use-template no-go" href="#" data-value="30">Emotion segmentation</a>
</div><div class="ui item">
<a class="use-template no-go" href="#" data-value="47">Speaker diarization</a>
</div><div class="ui item">
<a class="use-template no-go" href="#" data-value="32">Transcription per region</a>
</div><div class="ui item">
<a class="use-template no-go" href="#" data-value="15">Transcription whole audio</a>
</div>
</div>
<!-- Template basic: image -->
<div class="three wide column category">
<i class="icon image" title="Image sources"></i>
<div class="ui item">
<a class="use-template no-go" href="#" data-value="25">Image classification</a>
</div><div class="ui item">
<a class="use-template no-go" href="#" data-value="41">Bbox object detection</a>
</div><div class="ui item">
<a class="use-template no-go" href="#" data-value="9">Brush segmentation</a>
</div><div class="ui item">
<a class="use-template no-go" href="#" data-value="33">Circular object detector</a>
</div><div class="ui item">
<a class="use-template no-go" href="#" data-value="14">Keypoints and landmarks</a>
</div><div class="ui item">
<a class="use-template no-go" href="#" data-value="34">Polygon segmentation</a>
</div><div class="ui item">
<a class="use-template no-go" href="#" data-value="27">Multi-image classification</a>
</div>
</div>
<!-- Template basic: text -->
<div class="three wide column category">
<i class="icon font" title="Text sources"></i>
<div class="ui item">
<a class="use-template no-go" href="#" data-value="39">Text classification</a>
</div><div class="ui item">
<a class="use-template no-go" href="#" data-value="20">Multi classification</a>
</div><div class="ui item">
<a class="use-template no-go" href="#" data-value="10">Named entity recognition</a>
</div><div class="ui item">
<a class="use-template no-go" href="#" data-value="17">Text summarization</a>
</div><div class="ui item">
<a class="use-template no-go" href="#" data-value="50">Word alignment</a>
</div>
</div>
<!-- Template basic: html -->
<div class="three wide column category">
<i class="icon code" title="HTML sources"></i>
<div class="ui item">
<a class="use-template no-go" href="#" data-value="6">HTML classification</a>
</div><div class="ui item">
<a class="use-template no-go" href="#" data-value="46">HTML NER tagging</a>
</div><div class="ui item">
<a class="use-template no-go" href="#" data-value="43">Dialogs &amp; conversations</a>
</div><div class="ui item">
<a class="use-template no-go" href="#" data-value="1">Rate PDF</a>
</div><div class="ui item">
<a class="use-template no-go" href="#" data-value="4">Rate website</a>
</div><div class="ui item">
<a class="use-template no-go" href="#" data-value="13">Video classifier</a>
</div>
</div>
<!-- Template basic: time-series -->
<div class="three wide column category">
<i class="icon wave square" title="Time series"></i>
<div class="ui item">
<a class="use-template no-go" href="#" data-value="16">Time Series classification</a>
</div><div class="ui item">
<a class="use-template no-go" href="#" data-value="8">Import CSV</a>
</div><div class="ui item">
<a class="use-template no-go" href="#" data-value="2">Import JSON</a>
</div><div class="ui item">
<a class="use-template no-go" href="#" data-value="21">Segmentation extended</a>
</div><div class="ui item">
<a class="use-template no-go" href="#" data-value="23">Multi-step annotation</a>
</div>
</div>
</div>
</div>
</div>
<div class="ui accordion" id="adv-templates">
<div class="title" onclick="$(this).next().toggle('fast'); $(this).find('.dropdown').toggleClass('active')">
<i class="dropdown icon"></i> Advanced config templates
</div>
<div class="content" style="margin-top:-8px">
<!-- Templates categories -->
<div class="ui grid stackable" style="margin: 0 auto;">
<!-- Template advanced: layouts -->
<div class="three wide column category">
<i class="icon eye" title="View layout examples"></i>
<div class="ui item">
<a class="use-template no-go" href="#" data-value="35">Filtering long labels list</a>
</div><div class="ui item">
<a class="use-template no-go" href="#" data-value="45">Long text with scrollbar</a>
</div><div class="ui item">
<a class="use-template no-go" href="#" data-value="40">Pretty choices</a>
</div><div class="ui item">
<a class="use-template no-go" href="#" data-value="3">Sticky header</a>
</div><div class="ui item">
<a class="use-template no-go" href="#" data-value="28">Sticky left column</a>
</div><div class="ui item">
<a class="use-template no-go" href="#" data-value="5">Three columns</a>
</div><div class="ui item">
<a class="use-template no-go" href="#" data-value="37">Two columns</a>
</div>
</div>
<!-- Template advanced: nested -->
<div class="three wide column category">
<i class="icon bullseye" title="Nested examples with conditional behavior"></i>
<div class="ui item">
<a class="use-template no-go" href="#" data-value="11">Conditional classification</a>
</div><div class="ui item">
<a class="use-template no-go" href="#" data-value="19">Three level classification</a>
</div><div class="ui item">
<a class="use-template no-go" href="#" data-value="44">Two level classification</a>
</div>
</div>
<!-- Template advanced: per-region -->
<div class="three wide column category">
<i class="icon vector square" title="Per region examples"></i>
<div class="ui item">
<a class="use-template no-go" href="#" data-value="29">Audio regions labeling</a>
</div><div class="ui item">
<a class="use-template no-go" href="#" data-value="38">Image bboxes labeling</a>
</div><div class="ui item">
<a class="use-template no-go" href="#" data-value="7">Text spans labeling</a>
</div>
</div>
<!-- Template advanced: other -->
<div class="three wide column category">
<i class="icon archive" title="Other sources"></i>
<div class="ui item">
<a class="use-template no-go" href="#" data-value="24">Image &amp; Audio &amp; Text</a>
</div><div class="ui item">
<a class="use-template no-go" href="#" data-value="26">Pairwise comparison</a>
</div><div class="ui item">
<a class="use-template no-go" href="#" data-value="22">Relations among entities</a>
</div><div class="ui item">
<a class="use-template no-go" href="#" data-value="42">Table with key-value</a>
</div><div class="ui item">
<a class="use-template no-go" href="#" data-value="18">Table with text fields</a>
</div><div class="ui item">
<a class="use-template no-go" href="#" data-value="49">Video timeline segmentation</a>
</div>
</div>
<!-- Template advanced: time-series -->
<div class="three wide column category">
<i class="icon wave square" title="Time series"></i>
<div class="ui item">
<a class="use-template no-go" href="#" data-value="31">Import CSV no time</a>
</div><div class="ui item">
<a class="use-template no-go" href="#" data-value="12">Import CSV headless</a>
</div><div class="ui item">
<a class="use-template no-go" href="#" data-value="0">Relations between channels</a>
</div><div class="ui item">
<a class="use-template no-go" href="#" data-value="36">Relations with text</a>
</div>
</div>
</div>
</div>
</div>

View File

@ -0,0 +1,33 @@
---
title: Audio Classification
type: templates
order: 401
meta_title: Audio Classification Data Labeling Template
meta_description: Label Studio Audio Classification Template for machine learning and data science data labeling projects.
---
Listen to the audio file and classify
<img src="/images/screens/audio_classification.png" class="img-template-example" title="Audio Classification" />
## Run
```bash
python server.py -c config.json -l ../examples/audio_classification/config.xml -i ../examples/audio_classification/tasks.json -o output_audio_classes
```
## Config
```html
<View>
<Header value="Listen to the audio:"></Header>
<Audio name="audio" value="$url"></Audio>
<Header value="Select its topic:"></Header>
<Choices name="label" toName="audio" choice="single-radio" showInline="true">
<Choice value="Politics"></Choice>
<Choice value="Business"></Choice>
<Choice value="Education"></Choice>
<Choice value="Other"></Choice>
</Choices>
</View>
```

View File

@ -0,0 +1,36 @@
---
title: Audio Regions
type: templates
order: 402
meta_title: Audio Regions Data Labeling Template
meta_description: Label Studio Audio Regions Template for machine learning and data science data labeling projects.
---
Listen to the audio file and classify
<img src="/images/screens/audio_regions.png" class="img-template-example" title="Audio Regions" />
<p class="tip">For audio regions to work when you have remote URLs, you need to configure CORS to be wide-open</p>
## Run
```bash
label-studio init --template=audio_regions audio_regions_project
label-studio start audio_regions_project
```
## Config
```html
<View>
<Header value="Select its topic:"></Header>
<Labels name="label" toName="audio" choice="multiple">
<Label value="Politics" background="yellow"></Label>
<Label value="Business" background="red"></Label>
<Label value="Education" background="blue"></Label>
<Label value="Other"></Label>
</Labels>
<Header value="Listen to the audio:"></Header>
<AudioPlus name="audio" value="$url"></AudioPlus>
</View>
```

View File

@ -0,0 +1,34 @@
---
title: Dialogue Analysis
type: templates
order: 301
meta_title: Dialogue Analysis Data Labeling Template
meta_description: Label Studio Dialogue Analysis Template for machine learning and data science data labeling projects.
---
Analyze the chat dialog, classify it and provide your own answer
<img src="/images/screens/dialogue_analysis.png" class="img-template-example" title="Dialog Analysis" />
## Run
```bash
label-studio init --template=dialog_analysis dialog_analysis_project
label-studio start dialog_analysis_project
```
## Config
```html
<View>
<HyperText name="dialog" value="$dialogs"></HyperText>
<Header value="Rate last answer:"></Header>
<Choices name="chc-1" choice="single-radio" toName="dialog" showInline="true">
<Choice value="Bad answer"></Choice>
<Choice value="Neutral answer"></Choice>
<Choice value="Good answer"></Choice>
</Choices>
<Header value="Your answer:"></Header>
<TextArea name="answer"></TextArea>
</View>
```

View File

@ -0,0 +1,30 @@
---
title: HTML Documents NER
type: templates
order: 302
meta_title: HTML Document Data Labeling Template
meta_description: Label Studio HTML Document Template for named entity recognition in machine learning and data science data labeling projects.
---
Named entity for the HTML Documents
<img src="/images/screens/html_document.png" class="img-template-example" title="HTML Documents" />
## Run
```bash
label-studio init --template=html_document html_document_project
label-studio start html_document_project
```
## Config
```html
<View>
<Labels name="ner" toName="text">
<Label value="Person"></Label>
<Label value="Organization"></Label>
</Labels>
<HyperText name="text" value="$text"></HyperText>
</View>
```

View File

@ -0,0 +1,30 @@
---
title: Image Object Detection
type: templates
order: 102
meta_title: Image Object Detection Data Labeling Template
meta_description: Label Studio Image Object Detection Template for machine learning and data science data labeling projects.
---
Image bounding box labeling
<img src="/images/screens/image_bbox.png" class="img-template-example" title="Images Bbounding box" />
## Run
```bash
label-studio init --template=image_bbox image_bbox_project
label-studio start image_bbox_project
```
## Config
```html
<View>
<Image name="img" value="$image"></Image>
<RectangleLabels name="tag" toName="img">
<Label value="Planet"></Label>
<Label value="Moonwalker" background="blue"></Label>
</RectangleLabels>
</View>
```

View File

@ -0,0 +1,21 @@
---
title: Image Classification
type: templates
order: 101
meta_title: Image Classification Data Labeling Template
meta_description: Label Studio Image Classification Template for machine learning and data science data labeling projects.
---
Image classification with checkboxes.
## Config
```html
<View>
<Image name="img" value="$image"></Image>
<Choices name="tag" toName="img" choice="single-radio">
<Choice value="Airbus"></Choice>
<Choice value="Boeing" background="blue"></Choice>
</Choices>
</View>
```

View File

@ -0,0 +1,30 @@
---
title: Image Ellipse
type: templates
order: 104
meta_title: Image Ellipse Data Labeling Template
meta_description: Label Studio Image Ellipse Template for machine learning and data science data labeling projects.
---
Put ellipses on the image
<img src="/images/screens/image_ellipse.png" class="img-template-example" title="Images Ellipse" />
## Run
```bash
label-studio init image_ellipse_project
label-studio start image_ellipse_project
```
## Config
```html
<View>
<EllipseLabels name="tag" toName="img">
<Label value="Blood Cell" />
<Label value="Stem Cell" />
</EllipseLabels>
<Image name="img" value="$image" />
</View>
```

View File

@ -0,0 +1,30 @@
---
title: Image Key Points
type: templates
order: 103
meta_title: Image Keypoints Data Labeling Template
meta_description: Label Studio Image Keypoints Template for machine learning and data science data labeling projects.
---
Key Points labeling for the images
<img src="/images/screens/image_keypoints.png" class="img-template-example" title="Images Key Points" />
## Run
```bash
label-studio init --template=image_keypoints image_keypoints_project
label-studio start image_keypoints_project
```
## Config
```html
<View>
<KeyPointLabels name="tag" toName="img" strokewidth="5">
<Label value="Ear" background="blue"></Label>
<Label value="Lip" background="red"></Label>
</KeyPointLabels>
<Image name="img" value="$image" zoom="true"></Image>
</View>
```

View File

@ -0,0 +1,37 @@
---
title: Image Polygons
type: templates
order: 104
meta_title: Image Polygons Data Labeling Template
meta_description: Label Studio Image Polygons Template for machine learning and data science data labeling projects.
---
Image polygons labeling
<img src="/images/screens/image_polygons.png" class="img-template-example" title="Images Polygons" />
## Run
```bash
label-studio init --template=image_polygons image_polygons_project
label-studio start image_polygons_project
```
## Config
```html
<View style="display: flex">
<View style="width: 100px">
<Header value="Pick label" />
<PolygonLabels name="tag" toName="img" strokewidth="2" pointstyle="circle" pointsize="small" showInline="false">
<Label value="Car" background="blue" />
<Label value="Sign" background="blue" />
<Label value="Person" background="blue" />
<Label value="Tree" background="green" />
</PolygonLabels>
</View>
<View>
<Image name="img" value="$image" showMousePos="true" zoom="true" />
</View>
</View>
```

View File

@ -0,0 +1,28 @@
---
title: Image Segmentation
type: templates
order: 103
meta_title: Image Segmentation Data Labeling Template
meta_description: Label Studio Image Segmentation Template for machine learning and data science data labeling projects.
---
Image segmentation using a brush and producing a mask
## Run
```bash
label-studio init image_segmentation_project
label-studio start image_segmentation_project
```
## Config
```html
<View>
<BrushLabels name="tag" toName="img">
<Label value="Planet" />
<Label value="Moonwalker" background="rgba(255,0,0,0.5)" />
</BrushLabels>
<Image name="img" value="$image" zoom="true" zoomControl="true" />
</View>
```

Some files were not shown because too many files have changed in this diff Show More