Srisurya Teja commited on
Commit
50736a5
1 Parent(s): 5e66110

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +54 -0
README.md CHANGED
@@ -0,0 +1,54 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Web-Based Text Extraction and Retrieval System
2
+
3
+ This project is a web application that performs Optical Character Recognition (OCR) on images and highlights keywords within the extracted text. The system supports both English and Hindi languages, allowing users to upload images, extract text, and search for specific keywords within the extracted content.
4
+
5
+ ## Features
6
+ - **Language Support**: English and Hindi
7
+ - **OCR**: Extracts text from uploaded images.
8
+ - **Keyword Search**: Highlights specified keywords in the extracted text.
9
+ - **Multiple Image Formats**: Supports PNG, JPG, and JPEG image formats.
10
+
11
+ ## Tech Stack
12
+ - **Python**
13
+ - **Streamlit**: Web interface for interactive image upload and keyword search.
14
+ - **Hugging Face Transformers**: Used for text extraction in English.
15
+ - **EasyOCR**: For Hindi text extraction from images.
16
+ - **PIL**: To handle image uploads.
17
+ - **Torch**: For working with the model and tokenizers.
18
+ - **Numpy**: For image processing.
19
+
20
+ ## How it Works
21
+ ### English OCR Flow:
22
+ 1. Upload an image containing text.
23
+ 2. The application uses a Hugging Face pre-trained model to extract text.
24
+ 3. The extracted text is displayed, and users can search for keywords.
25
+ 4. The keywords are highlighted within the extracted text.
26
+
27
+ ### Hindi OCR Flow:
28
+ 1. Upload an image with Hindi text.
29
+ 2. EasyOCR is used to detect and extract Hindi text from the image.
30
+ 3. Users can search for Hindi keywords, which will be highlighted in the extracted content.
31
+
32
+ ## Installation
33
+
34
+ 1. **Clone the Repository**:
35
+ ```bash
36
+ git clone <https://github.com/SrisuryaTeja/Web-Based-Text-Extraction-and-Retrieval-System>
37
+ ```
38
+
39
+ 2. **Create and Activate a Virtual Environment**:
40
+ ```bash
41
+ python -m venv myenv
42
+ source myenv/bin/activate # On Windows use myenv\Scripts\activate
43
+ ```
44
+
45
+ 3. **Install Dependencies**:
46
+ Install the required packages listed in the `requirements.txt` file:
47
+ ```bash
48
+ pip install -r requirements.txt
49
+ ```
50
+
51
+ 4. **Run the Application**:
52
+ ```bash
53
+ streamlit run app.py
54
+ ```