ClaudeHu05 commited on
Commit
bd3fd88
1 Parent(s): f40f15d

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +45 -0
README.md ADDED
@@ -0,0 +1,45 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ # For reference on model card metadata, see the spec: https://github.com/huggingface/hub-docs/blob/main/modelcard.md?plain=1
3
+ # Doc / guide: https://huggingface.co/docs/hub/model-cards
4
+ {}
5
+ ---
6
+
7
+ # Vec2Vec ChIP-atlas hg38
8
+
9
+ ## Model Details
10
+
11
+ ### Model Description
12
+
13
+ This is a Vec2Vec model that encodes embedding vectors of natural language into embedding vectors of BED files. This model was trained with hg38 ChIP-atlas ATAC-seq data. The natural language metadata came from the experiment list, their embedding vectors were encoded by [`sentence-transformers` with `microsoft/biogpt` model](https://github.com/UKPLab/sentence-transformers/issues/1824). The BED files were embedded by [Region2Vec](https://huggingface.co/databio/r2v-ChIP-atlas-hg38)
14
+
15
+ - **Developed by:** Ziyang "Claude" Hu
16
+ - **Model type:** Vec2Vec
17
+ - **Language(s) (NLP):** hg38
18
+
19
+ ### Model Sources [optional]
20
+
21
+ <!-- Provide the basic links for the model. -->
22
+
23
+ - **Repository:** https://github.com/databio/geniml
24
+ - **Paper [optional]:** N/A
25
+
26
+ ## Uses
27
+
28
+ This model can be used to search BED files with natural language query strings. In the search interface, the query strings will be encoded by same sentence-transformers model, and the output vector will be encoded into the final query vector by this Vec2Vec. The K BED files whose embedding vectors (embedded by same Region2Vec) are closest to the final query vector are results. It is limited to hg38. It is not recommended to use this model for data outside ATAC-seq.
29
+
30
+ ## How to Get Started with the Model
31
+
32
+ You can download and start encoding new genomic region data using the following code:
33
+ ```python
34
+ from geniml.text2bednn.text2bednn import Vec2VecFNN
35
+
36
+ model = Vec2VecFNN("databio/v2v-bioGPT-ATAC-hg38")
37
+ ```
38
+
39
+ [More Information Needed]
40
+
41
+ ## Training Details
42
+
43
+ ### Training Data
44
+
45
+ X: