Pedro Ortiz Suarez

pjox

AI & ML interests

Language modeling, parsing, sequence tagging, NER, historical languages.

Organizations

pjox's activity

New activity in commoncrawl/statistics 5 months ago
New activity in oscar-corpus/OSCAR-2301 11 months ago
New activity in oscar-corpus/colossal-oscar-1.0 about 1 year ago
New activity in oscar-corpus/colossal-oscar-1.0 over 1 year ago

Change foldernames

4
#3 opened over 1 year ago by hac541309
New activity in oscar-corpus/OSCAR-2201 over 1 year ago

Unsafe Files

20
#12 opened over 1 year ago by GetzPro
New activity in oscar-corpus/OSCAR-2301 over 1 year ago

About the number of documents

6
#6 opened over 1 year ago by lixin4ever
New activity in oscar-corpus/colossal-oscar-1.0 over 1 year ago
New activity in oscar-corpus/OSCAR-2301 over 1 year ago

Changing into Parquet

2
#5 opened over 1 year ago by hac541309
New activity in pjox/dalembert over 1 year ago
New activity in oscar-corpus/OSCAR-2301 over 1 year ago

Deduplicated English Corpus

2
#3 opened over 1 year ago by conceptofmind

Data hosting on Huggingface

1
#2 opened over 1 year ago by hieuhocnlp

How to download only one language?

2
#1 opened over 1 year ago by musabg
New activity in oscar-corpus/OSCAR-2201 over 1 year ago