Spaces:
Build error
Build error
Commit History
Switching slider to selectbox for text lengths
84f1693
Sasha
commited on
Merge branch 'main' of https://huggingface.co/spaces/huggingface/data-measurements-tool-2 into main
ab232f5
wikitext vocab counts cache
95ab841
Changing text lengths plot to a static one, saving to .png
abff13d
Sasha
commited on
Cache cache everywhere
64ba64c
Merge branch 'main' of https://huggingface.co/spaces/huggingface/data-measurements-tool-2 into main
e9b3ffa
zipf fig caches
d72b1cf
Changing aggrid back to dataframe, even if we can't make the width dynamic
ca9634c
Sasha
commited on
Merge branch 'main' of https://huggingface.co/spaces/huggingface/data-measurements-tool-2 into main
14ce207
dup counts cache
91c90ea
Script to run through cache creation
ccfd542
Change to npmi display ordering
5546565
meg-huggingface
commited on
Merge branch 'main' of https://huggingface.co/spaces/huggingface/data-measurements-tool-2 into main
c4e990f
meg-huggingface
commited on
c4 realnewslike train text
ba51326
meg-huggingface
commited on
Loading per-widget. Various changes to streamlit interactions for efficiency.
d3c28ec
meg-huggingface
commited on
More cache; this time adding length_df.feather
e3f7160
meg-huggingface
commited on
More cache
80f1b62
meg-huggingface
commited on
Merge branch 'main' of https://huggingface.co/spaces/huggingface/data-measurements-tool-2 into main
5d4982b
meg-huggingface
commited on
One more flag passing needed for setting live deployment
e122a90
meg-huggingface
commited on
Adds flag for live deployment so that things will not be all recalculated when live.
7c5239c
meg-huggingface
commited on
Trying cache push without the largest files, as they throttle our pushes
58471d2
meg-huggingface
commited on
wiki general stats
ff8aca1
meg-huggingface
commited on
Finishing c4 en train text cache
63ef066
meg-huggingface
commited on
c4 en train text cache
b1e4418
meg-huggingface
commited on
A variety of cache
11c0439
meg-huggingface
commited on
text dset cache
1652bd6
meg-huggingface
commited on
glue cola train sentence cache
edf068c
meg-huggingface
commited on
removing extraneous backup
724d1b1
meg-huggingface
commited on
Merge branch 'main' of https://huggingface.co/spaces/huggingface/data-measurements-tool-2 into main
b28e93b
meg-huggingface
commited on
Hate speech offensive
e530aff
meg-huggingface
commited on
General stats cache
adb962b
dset peek caches
a400d60
c4 en noblocklist cache
fd75df7
c4 ennoblocklist cache
13856fd
Starting imdb cache
090cc42
meg-huggingface
commited on
Changing cache naming scheme to make consistent.
cb64f9a
meg-huggingface
commited on
Hate speech cache
201f0a7
More flexibility in specifying cache directory.
101aa18
meg-huggingface
commited on
Scripts to generate cache
db74ba9
meg-huggingface
commited on
Standardizing filenaming a bit.
0803ab3
meg-huggingface
commited on
More modularizing; npmi and labels
a2ae370
meg-huggingface
commited on
Some additional modularizing and caching of the text lengths widget
335424f
meg-huggingface
commited on
Modularization and caching of text length widget
85cf91c
meg-huggingface
commited on
Removes extraneous debugging print statements
6a9c993
meg-huggingface
commited on
Missing a dependency; adding to requirements.txt
6557527
meg-huggingface
commited on
Begins modularizing so that each widget can be independently loaded without having a requirement on the ordering of load_or_preparing in app.py. This means that each function corresponding to a widget will check if the variables it depends on have been calculated yet. If not, it will call back to calculate them. Because of the messiness this causes with passing the use_cache variable around, I've now set use_cache as a global variable, set when the DatasetStatisticsCacheClass is initialized, and removed the use_cache arguments appearing in nearly every function.
4b53042
meg-huggingface
commited on
Removing need to keep around base dset for the header widget; now just saving what is shown -- the first n lines of the base dataset -- as a json, and loading if it's cached.
66693d5
meg-huggingface
commited on
Removing any need for a dataframe in expander_general_stats; instead making sure to cache and load the small amount of details needed for this widget. Note I also moved around a couple functions -- same content, just moved -- so that it was easier for me to navigate through the code. I also pulled out a couple of sub-functions from larger functions, again to make the code easier to work with/understand, as well as helping to further modularize so we can limit what needs to be cached.
e1f2cc3
meg-huggingface
commited on
Splitting prepare_dataset into preparing the base dataset, and the tokenized dataset. This will help us to have further control over caching and loading data, eventually removing the storage of base dataset.
6af9ef6
meg-huggingface
commited on