# AI Text Steganography ## Description - This is the baseline implementation of AI Text Steganography for our final project in Software Designing and Applied Information Security courses in HCMUS-VNU. - Our project focuses on hiding data inside a text sequence generated by LLMs (e.g. GPT-2). - We took inspiration from [Kirchenbauer et al.](https://arxiv.org/abs/2301.10226). ## Members - Tran Nam Khanh - Phan Le Dac Phu ## Installation 1. Clone this repository: ```Bash git clone https://github.com/trnKhanh/ai-text-steganography.git cd ai-text-steganography ``` 2. (Optional) Create new conda environment: ```Bash conda create -n ai-text-steganography python=3.10 conda activate ai-text-steganography ``` 3. Install requirements: ```Bash pip install -r requirements.txt ``` ## Usage - Gradio demo: ```Bash python demo.py ``` - RestAPI: ```Bash python api.py ``` - See help message of the Command Line Interface by: ```Bash python main.py -h ``` ## Documentation - To access the documentation for the RestAPI, launch the RestAPI and go to ## Configuration - `config.ini` is the config file of the project. We use the modified syntax of the `configparser` package. Every key-value pair follows the syntax: `key = type:value`. Currently, `type` can only be `int`, `float` or `str`. - Details on config: - `server`: parameters for the RestAPI: - `models.names`: names of LLMs allowed. Note that this follows the name defined on [Hugging Face](https://huggingface.co/models). - `models.params`: parameters used to load LLMs. - `encrypt.default`: default parameters for encryption algorithm. - `decrypt.default`: default parameters for decryption algorithm. ## Notes on implementation - Because of the limited resources, we load multiple models on the same machine (implementation is in `model_factory.py`): - Each model is first loaded to the `load_device` (e.g. cpu). - If there is a request to use a specific model, it is loaded to the `run_device` (e.g. gpu) for inference. - Therefore, only one model can be used for inference at a time. As a result, we could optimize the limited resources we have to allow users to choose different LLMs, but it forces the API to be synchronous instead. ## TODO lists - [x] Baseline code. - [x] CLI. - [x] Hashing schemes. - [x] Rest API. - [x] Basic Demo. - [ ] Statistical experiments. - [ ] Attack strategies - [ ] White-box - [ ] Black-box