Commits · huggingface/data-measurements-tool

Hate speech cache

201f0a7

meg HF staff commited on Dec 5, 2021

More flexibility in specifying cache directory.

101aa18

meg-huggingface commited on Dec 5, 2021

Scripts to generate cache

db74ba9

meg-huggingface commited on Dec 5, 2021

Standardizing filenaming a bit.

0803ab3

meg-huggingface commited on Dec 5, 2021

More modularizing; npmi and labels

a2ae370

meg-huggingface commited on Dec 5, 2021

Some additional modularizing and caching of the text lengths widget

335424f

meg-huggingface commited on Dec 5, 2021

Modularization and caching of text length widget

85cf91c

meg-huggingface commited on Dec 5, 2021

Removes extraneous debugging print statements

6a9c993

meg-huggingface commited on Dec 5, 2021

Missing a dependency; adding to requirements.txt

6557527

meg-huggingface commited on Dec 5, 2021

Begins modularizing so that each widget can be independently loaded without having a requirement on the ordering of load_or_preparing in app.py. This means that each function corresponding to a widget will check if the variables it depends on have been calculated yet. If not, it will call back to calculate them. Because of the messiness this causes with passing the use_cache variable around, I've now set use_cache as a global variable, set when the DatasetStatisticsCacheClass is initialized, and removed the use_cache arguments appearing in nearly every function.

4b53042

meg-huggingface commited on Dec 5, 2021

Removing need to keep around base dset for the header widget; now just saving what is shown -- the first n lines of the base dataset -- as a json, and loading if it's cached.

66693d5

meg-huggingface commited on Dec 5, 2021

Removing any need for a dataframe in expander_general_stats; instead making sure to cache and load the small amount of details needed for this widget. Note I also moved around a couple functions -- same content, just moved -- so that it was easier for me to navigate through the code. I also pulled out a couple of sub-functions from larger functions, again to make the code easier to work with/understand, as well as helping to further modularize so we can limit what needs to be cached.

e1f2cc3

meg-huggingface commited on Dec 5, 2021

Splitting prepare_dataset into preparing the base dataset, and the tokenized dataset. This will help us to have further control over caching and loading data, eventually removing the storage of base dataset.

6af9ef6

meg-huggingface commited on Dec 4, 2021

Updating NLTK requirements due to vulnerability in versions below 3.6.4: contained an inefficient Regular Expression and is vulnerable to regular expression denial of service attacks

937841c

meg-huggingface commited on Dec 4, 2021

Continuing cache minimizing in new repository. Please see https://github.com/huggingface/DataMeasurements for full history

d8ab532

meg-huggingface commited on Dec 4, 2021

:art: add line to file to bump ci

7c5b4e0

yourusername commited on Dec 4, 2021

hate speech18 cache

976b82a

meg-huggingface commited on Dec 4, 2021

hate speech 18 pmi file cache

6508f0b

meg-huggingface commited on Dec 4, 2021

Test to push up a simple cache file

14dcacc

meg-huggingface commited on Dec 4, 2021

:construction_worker: update CI to rebase

6dec358

yourusername commited on Dec 3, 2021

:bug: filter_vocab -> filter_words

78cc3f9

yourusername commited on Dec 3, 2021

:bug: really make sure log_files/ exists

e1cd6af

yourusername commited on Dec 3, 2021

:bug: add log_files dir if not exists

c070f8c

yourusername commited on Dec 3, 2021

:rocket: add app

e88b792

yourusername commited on Dec 3, 2021

:rocket: add app and reqs

64a1ca0

yourusername commited on Dec 3, 2021

:construction_worker: add CI

3c3199f

yourusername commited on Dec 3, 2021

:bug: remove line added by CoPilot

3f4a261

yourusername commited on Dec 3, 2021

:memo: add README.md

07eebf0

yourusername commited on Dec 3, 2021

:tada: init

9b51db9

yourusername commited on Dec 3, 2021

Initial commit

b9430ed
unverified

Yacine Jernite commited on Jul 20, 2021

Spaces:

huggingface
/

data-measurements-tool

Build error

Commit History

Hate speech cache

201f0a7

More flexibility in specifying cache directory.

101aa18

Scripts to generate cache

db74ba9

Standardizing filenaming a bit.

0803ab3

More modularizing; npmi and labels

a2ae370

Some additional modularizing and caching of the text lengths widget

335424f

Modularization and caching of text length widget

85cf91c

Removes extraneous debugging print statements

6a9c993

Missing a dependency; adding to requirements.txt

6557527

Removing need to keep around base dset for the header widget; now just saving what is shown -- the first n lines of the base dataset -- as a json, and loading if it's cached.

66693d5

Splitting prepare_dataset into preparing the base dataset, and the tokenized dataset. This will help us to have further control over caching and loading data, eventually removing the storage of base dataset.

6af9ef6

Updating NLTK requirements due to vulnerability in versions below 3.6.4: contained an inefficient Regular Expression and is vulnerable to regular expression denial of service attacks

937841c

Continuing cache minimizing in new repository. Please see https://github.com/huggingface/DataMeasurements for full history

d8ab532

:art: add line to file to bump ci

7c5b4e0

hate speech18 cache

976b82a

hate speech 18 pmi file cache

6508f0b

Test to push up a simple cache file

14dcacc

:construction_worker: update CI to rebase

6dec358

:bug: filter_vocab -> filter_words

78cc3f9

:bug: really make sure log_files/ exists

e1cd6af

:bug: add log_files dir if not exists

c070f8c

:rocket: add app

e88b792

:rocket: add app and reqs

64a1ca0

:construction_worker: add CI

3c3199f

:bug: remove line added by CoPilot

3f4a261

:memo: add README.md

07eebf0

:tada: init

9b51db9

Initial commit

b9430ed
unverified

Commit History

Hate speech cache 201f0a7

More flexibility in specifying cache directory. 101aa18

Scripts to generate cache db74ba9

Standardizing filenaming a bit. 0803ab3

More modularizing; npmi and labels a2ae370

Some additional modularizing and caching of the text lengths widget 335424f

Modularization and caching of text length widget 85cf91c

Removes extraneous debugging print statements 6a9c993

Missing a dependency; adding to requirements.txt 6557527

Removing need to keep around base dset for the header widget; now just saving what is shown -- the first n lines of the base dataset -- as a json, and loading if it's cached. 66693d5

Splitting prepare_dataset into preparing the base dataset, and the tokenized dataset. This will help us to have further control over caching and loading data, eventually removing the storage of base dataset. 6af9ef6

Updating NLTK requirements due to vulnerability in versions below 3.6.4: contained an inefficient Regular Expression and is vulnerable to regular expression denial of service attacks 937841c

Continuing cache minimizing in new repository. Please see https://github.com/huggingface/DataMeasurements for full history d8ab532

:art: add line to file to bump ci 7c5b4e0

hate speech18 cache 976b82a

hate speech 18 pmi file cache 6508f0b

Test to push up a simple cache file 14dcacc

:construction_worker: update CI to rebase 6dec358

:bug: filter_vocab -> filter_words 78cc3f9

:bug: really make sure log_files/ exists e1cd6af

:bug: add log_files dir if not exists c070f8c

:rocket: add app e88b792

:rocket: add app and reqs 64a1ca0

:construction_worker: add CI 3c3199f

:bug: remove line added by CoPilot 3f4a261

:memo: add README.md 07eebf0

:tada: init 9b51db9

Initial commit b9430ed unverified

Hate speech cache

201f0a7

More flexibility in specifying cache directory.

101aa18

Scripts to generate cache

db74ba9

Standardizing filenaming a bit.

0803ab3

More modularizing; npmi and labels

a2ae370

Some additional modularizing and caching of the text lengths widget

335424f

Modularization and caching of text length widget

85cf91c

Removes extraneous debugging print statements

6a9c993

Missing a dependency; adding to requirements.txt

6557527

Removing need to keep around base dset for the header widget; now just saving what is shown -- the first n lines of the base dataset -- as a json, and loading if it's cached.

66693d5

Splitting prepare_dataset into preparing the base dataset, and the tokenized dataset. This will help us to have further control over caching and loading data, eventually removing the storage of base dataset.

6af9ef6

Updating NLTK requirements due to vulnerability in versions below 3.6.4: contained an inefficient Regular Expression and is vulnerable to regular expression denial of service attacks

937841c

Continuing cache minimizing in new repository. Please see https://github.com/huggingface/DataMeasurements for full history

d8ab532

:art: add line to file to bump ci

7c5b4e0

hate speech18 cache

976b82a

hate speech 18 pmi file cache

6508f0b

Test to push up a simple cache file

14dcacc

:construction_worker: update CI to rebase

6dec358

:bug: filter_vocab -> filter_words

78cc3f9

:bug: really make sure log_files/ exists

e1cd6af

:bug: add log_files dir if not exists

c070f8c

:rocket: add app

e88b792

:rocket: add app and reqs

64a1ca0

:construction_worker: add CI

3c3199f

:bug: remove line added by CoPilot

3f4a261

:memo: add README.md

07eebf0

:tada: init

9b51db9

Initial commit

b9430ed
unverified