Spaces:
Sleeping
Sleeping
# AI Text Steganography | |
## Description | |
- This is the baseline implementation of AI Text Steganography for our final project in Software Designing and Applied Information Security courses in HCMUS-VNU. | |
- Our project focuses on hiding data inside a text sequence generated by LLMs (e.g. GPT-2). | |
- We took inspiration from [Kirchenbauer et al.](https://arxiv.org/abs/2301.10226). | |
## Members | |
- Tran Nam Khanh | |
- Phan Le Dac Phu | |
## Installation | |
1. Clone this repository: | |
```Bash | |
git clone https://github.com/trnKhanh/ai-text-steganography.git | |
cd ai-text-steganography | |
``` | |
2. (Optional) Create new conda environment: | |
```Bash | |
conda create -n ai-text-steganography python=3.10 | |
conda activate ai-text-steganography | |
``` | |
3. Install requirements: | |
```Bash | |
pip install -r requirements.txt | |
``` | |
## Usage | |
- Gradio demo: | |
```Bash | |
python demo.py | |
``` | |
- RestAPI: | |
```Bash | |
python api.py | |
``` | |
- See help message of the Command Line Interface by: | |
```Bash | |
python main.py -h | |
``` | |
## Configuration | |
- `config.ini` is the config file of the project. We use the modified syntax of the `configparser` package. Every key-value pair follows the syntax: `key = type:value`. Currently, `type` can only be `int`, `float` or `str`. | |
- Details on config: | |
- `server`: parameters for the RestAPI: | |
- `models.names`: names of LLMs allowed. Note that this follows the name defined on [Hugging Face](https://huggingface.co/models). | |
- `models.params`: parameters used to load LLMs. | |
- `encrypt.default`: default parameters for encryption algorithm. | |
- `decrypt.default`: default parameters for decryption algorithm. | |
## Notes on implementation | |
- Because of the limited resources, we load multiple models on the same machine (implementation is in `model_factory.py`): | |
- Each model is first loaded to the `load_device` (e.g. cpu). | |
- If there is a request to use a specific model, it is loaded to the `run_device` (e.g. gpu) for inference. | |
- Therefore, only one model can be used for inference at a time. As a result, we could optimize the limited resources we have to allow users to choose different LLMs, but it forces the API to be synchronous instead. | |
## TODO lists | |
- [x] Baseline code. | |
- [x] CLI. | |
- [x] Hashing schemes. | |
- [x] Rest API. | |
- [x] Basic Demo. | |
- [ ] Statistical experiments. | |
- [ ] Attack strategies | |
- [ ] White-box | |
- [ ] Black-box | |