Spaces:

tnk2908
/

ai-text-steganography

Sleeping

App Files Files Community

ai-text-steganography / README.md

tnk2908

Update README.md

f52f4a7 4 months ago

preview code

raw

history blame

2.31 kB

	# AI Text Steganography
	## Description
	- This is the baseline implementation of AI Text Steganography for our final project in Software Designing and Applied Information Security courses in HCMUS-VNU.
	- Our project focuses on hiding data inside a text sequence generated by LLMs (e.g. GPT-2).
	- We took inspiration from [Kirchenbauer et al.](https://arxiv.org/abs/2301.10226).
	## Members
	- Tran Nam Khanh
	- Phan Le Dac Phu
	## Installation
	1. Clone this repository:
	```Bash
	git clone https://github.com/trnKhanh/ai-text-steganography.git
	cd ai-text-steganography
	```
	2. (Optional) Create new conda environment:
	```Bash
	conda create -n ai-text-steganography python=3.10
	conda activate ai-text-steganography
	```
	3. Install requirements:
	```Bash
	pip install -r requirements.txt
	```
	## Usage
	- Gradio demo:
	```Bash
	python demo.py
	```
	- RestAPI:
	```Bash
	python api.py
	```
	- See help message of the Command Line Interface by:
	```Bash
	python main.py -h
	```
	## Configuration
	- `config.ini` is the config file of the project. We use the modified syntax of the `configparser` package. Every key-value pair follows the syntax: `key = type:value`. Currently, `type` can only be `int`, `float` or `str`.
	- Details on config:
	- `server`: parameters for the RestAPI:
	- `models.names`: names of LLMs allowed. Note that this follows the name defined on [Hugging Face](https://huggingface.co/models).
	- `models.params`: parameters used to load LLMs.
	- `encrypt.default`: default parameters for encryption algorithm.
	- `decrypt.default`: default parameters for decryption algorithm.
	## Notes on implementation
	- Because of the limited resources, we load multiple models on the same machine (implementation is in `model_factory.py`):
	- Each model is first loaded to the `load_device` (e.g. cpu).
	- If there is a request to use a specific model, it is loaded to the `run_device` (e.g. gpu) for inference.
	- Therefore, only one model can be used for inference at a time. As a result, we could optimize the limited resources we have to allow users to choose different LLMs, but it forces the API to be synchronous instead.
	## TODO lists
	- [x] Baseline code.
	- [x] CLI.
	- [x] Hashing schemes.
	- [x] Rest API.
	- [x] Basic Demo.
	- [ ] Statistical experiments.
	- [ ] Attack strategies
	- [ ] White-box
	- [ ] Black-box