Spaces:
Sleeping
Sleeping
Add some TODOs, update readme.md, update evaluation.ipynb
Browse files- Dockerfile +1 -0
- README.md +13 -2
- commafixer/src/fixer.py +2 -1
- notebooks/evaluation.ipynb +0 -0
Dockerfile
CHANGED
@@ -27,6 +27,7 @@ COPY --chown=user . .
|
|
27 |
FROM base as test
|
28 |
|
29 |
RUN pip install .[test]
|
|
|
30 |
RUN python -m pytest tests
|
31 |
|
32 |
FROM python:3.10-slim as deploy
|
|
|
27 |
FROM base as test
|
28 |
|
29 |
RUN pip install .[test]
|
30 |
+
# TODO don't run all at once because of memory errors?
|
31 |
RUN python -m pytest tests
|
32 |
|
33 |
FROM python:3.10-slim as deploy
|
README.md
CHANGED
@@ -27,6 +27,7 @@ Note that you might have to
|
|
27 |
`sudo service docker start`
|
28 |
first.
|
29 |
|
|
|
30 |
The application should then be available at http://localhost:8000.
|
31 |
For the API, see the `openapi.yaml` file.
|
32 |
Docker-compose mounts a volume and listens to changes in the source code, so the application will be reloaded and
|
@@ -35,7 +36,15 @@ reflect them.
|
|
35 |
We use multi-stage builds to reduce the image size, ensure flexibility in requirements and that tests are run before
|
36 |
each deployment.
|
37 |
However, while it does reduce the size by nearly 3GB, the resulting image still contains deep learning libraries and
|
38 |
-
pre-downloaded models, and will take around
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
39 |
|
40 |
Alternatively, you can setup a python environment by hand. It is recommended to use a virtualenv. Inside one, run
|
41 |
```bash
|
@@ -43,6 +52,8 @@ pip install -e .[test]
|
|
43 |
```
|
44 |
the `[test]` option makes sure to install test dependencies.
|
45 |
|
|
|
|
|
46 |
If you intend to perform training and evaluation of deep learning models, install also using the `[training]` option.
|
47 |
|
48 |
### Running tests
|
@@ -91,7 +102,7 @@ dataset are as follows:
|
|
91 |
| Model | precision | recall | F1 | support |
|
92 |
|----------|-----------|--------|------|---------|
|
93 |
| baseline | 0.79 | 0.72 | 0.75 | 10079 |
|
94 |
-
| ours* | 0.
|
95 |
*details of the fine-tuning process in the next section.
|
96 |
|
97 |
We treat each comma as one token instance, as opposed to the original paper, which NER-tags the whole multiple-token
|
|
|
27 |
`sudo service docker start`
|
28 |
first.
|
29 |
|
30 |
+
|
31 |
The application should then be available at http://localhost:8000.
|
32 |
For the API, see the `openapi.yaml` file.
|
33 |
Docker-compose mounts a volume and listens to changes in the source code, so the application will be reloaded and
|
|
|
36 |
We use multi-stage builds to reduce the image size, ensure flexibility in requirements and that tests are run before
|
37 |
each deployment.
|
38 |
However, while it does reduce the size by nearly 3GB, the resulting image still contains deep learning libraries and
|
39 |
+
pre-downloaded models, and will take around 9GB of disk space.
|
40 |
+
|
41 |
+
NOTE: Since the service is hosting two large deep learning models, there might be memory issues depending on your
|
42 |
+
machine, where the terminal running
|
43 |
+
docker would simply crash.
|
44 |
+
Should that happen, you can try increasing resources allocated to docker, or splitting commands in the docker file,
|
45 |
+
e.g., running tests one by one.
|
46 |
+
If everything fails, you can still use the hosted huggingface hub demo, or follow the steps below and run the app
|
47 |
+
locally without Docker.
|
48 |
|
49 |
Alternatively, you can setup a python environment by hand. It is recommended to use a virtualenv. Inside one, run
|
50 |
```bash
|
|
|
52 |
```
|
53 |
the `[test]` option makes sure to install test dependencies.
|
54 |
|
55 |
+
Then, run `python app.py` or `uvicorn --host 0.0.0.0 --port 8000 "app:app" --reload` to run the application.
|
56 |
+
|
57 |
If you intend to perform training and evaluation of deep learning models, install also using the `[training]` option.
|
58 |
|
59 |
### Running tests
|
|
|
102 |
| Model | precision | recall | F1 | support |
|
103 |
|----------|-----------|--------|------|---------|
|
104 |
| baseline | 0.79 | 0.72 | 0.75 | 10079 |
|
105 |
+
| ours* | 0.84 | 0.84 | 0.84 | 10079 |
|
106 |
*details of the fine-tuning process in the next section.
|
107 |
|
108 |
We treat each comma as one token instance, as opposed to the original paper, which NER-tags the whole multiple-token
|
commafixer/src/fixer.py
CHANGED
@@ -52,7 +52,8 @@ class CommaFixer:
|
|
52 |
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
|
53 |
model = PeftModel.from_pretrained(inference_model, model_name)
|
54 |
model = model.merge_and_unload() # Join LoRa matrices with the main model for faster inference
|
55 |
-
|
|
|
56 |
|
57 |
|
58 |
def _fix_commas_based_on_labels_and_offsets(
|
|
|
52 |
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
|
53 |
model = PeftModel.from_pretrained(inference_model, model_name)
|
54 |
model = model.merge_and_unload() # Join LoRa matrices with the main model for faster inference
|
55 |
+
# TODO batch, and move to CUDA if available
|
56 |
+
return model.eval(), tokenizer
|
57 |
|
58 |
|
59 |
def _fix_commas_based_on_labels_and_offsets(
|
notebooks/evaluation.ipynb
CHANGED
The diff for this file is too large to render.
See raw diff
|
|