Spaces:

XDHDD
/

Fckngproj

Sleeping

App Files Files Community

XDHDD commited on Dec 20, 2023

Commit

687e655

•

1 Parent(s): d7506cc

Upload 12 files

Browse files

Files changed (12) hide show

LICENSE +400 -0
README.md +191 -8
app (1).py +122 -0
config.py +59 -0
dataset.py +224 -0
gitattributes +5 -0
index.html +139 -0
inference_onnx.py +63 -0
loss.py +145 -0
main.py +131 -0
requirements (1).txt +18 -0
sample.wav +0 -0

LICENSE ADDED Viewed

	@@ -0,0 +1,400 @@

+Attribution-NonCommercial 4.0 International
+=======================================================================
+Creative Commons Corporation ("Creative Commons") is not a law firm and
+does not provide legal services or legal advice. Distribution of
+Creative Commons public licenses does not create a lawyer-client or
+other relationship. Creative Commons makes its licenses and related
+information available on an "as-is" basis. Creative Commons gives no
+warranties regarding its licenses, any material licensed under their
+terms and conditions, or any related information. Creative Commons
+disclaims all liability for damages resulting from their use to the
+fullest extent possible.
+Using Creative Commons Public Licenses
+Creative Commons public licenses provide a standard set of terms and
+conditions that creators and other rights holders may use to share
+original works of authorship and other material subject to copyright
+and certain other rights specified in the public license below. The
+following considerations are for informational purposes only, are not
+exhaustive, and do not form part of our licenses.
+     Considerations for licensors: Our public licenses are
+     intended for use by those authorized to give the public
+     permission to use material in ways otherwise restricted by
+     copyright and certain other rights. Our licenses are
+     irrevocable. Licensors should read and understand the terms
+     and conditions of the license they choose before applying it.
+     Licensors should also secure all rights necessary before
+     applying our licenses so that the public can reuse the
+     material as expected. Licensors should clearly mark any
+     material not subject to the license. This includes other CC-
+     licensed material, or material used under an exception or
+     limitation to copyright. More considerations for licensors:
+	wiki.creativecommons.org/Considerations_for_licensors
+     Considerations for the public: By using one of our public
+     licenses, a licensor grants the public permission to use the
+     licensed material under specified terms and conditions. If
+     the licensor's permission is not necessary for any reason--for
+     example, because of any applicable exception or limitation to
+     copyright--then that use is not regulated by the license. Our
+     licenses grant only permissions under copyright and certain
+     other rights that a licensor has authority to grant. Use of
+     the licensed material may still be restricted for other
+     reasons, including because others have copyright or other
+     rights in the material. A licensor may make special requests,
+     such as asking that all changes be marked or described.
+     Although not required by our licenses, you are encouraged to
+     respect those requests where reasonable. More_considerations
+     for the public:
+	wiki.creativecommons.org/Considerations_for_licensees
+=======================================================================
+Creative Commons Attribution-NonCommercial 4.0 International Public
+License
+By exercising the Licensed Rights (defined below), You accept and agree
+to be bound by the terms and conditions of this Creative Commons
+Attribution-NonCommercial 4.0 International Public License ("Public
+License"). To the extent this Public License may be interpreted as a
+contract, You are granted the Licensed Rights in consideration of Your
+acceptance of these terms and conditions, and the Licensor grants You
+such rights in consideration of benefits the Licensor receives from
+making the Licensed Material available under these terms and
+conditions.
+Section 1 -- Definitions.
+  a. Adapted Material means material subject to Copyright and Similar
+     Rights that is derived from or based upon the Licensed Material
+     and in which the Licensed Material is translated, altered,
+     arranged, transformed, or otherwise modified in a manner requiring
+     permission under the Copyright and Similar Rights held by the
+     Licensor. For purposes of this Public License, where the Licensed
+     Material is a musical work, performance, or sound recording,
+     Adapted Material is always produced where the Licensed Material is
+     synched in timed relation with a moving image.
+  b. Adapter's License means the license You apply to Your Copyright
+     and Similar Rights in Your contributions to Adapted Material in
+     accordance with the terms and conditions of this Public License.
+  c. Copyright and Similar Rights means copyright and/or similar rights
+     closely related to copyright including, without limitation,
+     performance, broadcast, sound recording, and Sui Generis Database
+     Rights, without regard to how the rights are labeled or
+     categorized. For purposes of this Public License, the rights
+     specified in Section 2(b)(1)-(2) are not Copyright and Similar
+     Rights.
+  d. Effective Technological Measures means those measures that, in the
+     absence of proper authority, may not be circumvented under laws
+     fulfilling obligations under Article 11 of the WIPO Copyright
+     Treaty adopted on December 20, 1996, and/or similar international
+     agreements.
+  e. Exceptions and Limitations means fair use, fair dealing, and/or
+     any other exception or limitation to Copyright and Similar Rights
+     that applies to Your use of the Licensed Material.
+  f. Licensed Material means the artistic or literary work, database,
+     or other material to which the Licensor applied this Public
+     License.
+  g. Licensed Rights means the rights granted to You subject to the
+     terms and conditions of this Public License, which are limited to
+     all Copyright and Similar Rights that apply to Your use of the
+     Licensed Material and that the Licensor has authority to license.
+  h. Licensor means the individual(s) or entity(ies) granting rights
+     under this Public License.
+  i. NonCommercial means not primarily intended for or directed towards
+     commercial advantage or monetary compensation. For purposes of
+     this Public License, the exchange of the Licensed Material for
+     other material subject to Copyright and Similar Rights by digital
+     file-sharing or similar means is NonCommercial provided there is
+     no payment of monetary compensation in connection with the
+     exchange.
+  j. Share means to provide material to the public by any means or
+     process that requires permission under the Licensed Rights, such
+     as reproduction, public display, public performance, distribution,
+     dissemination, communication, or importation, and to make material
+     available to the public including in ways that members of the
+     public may access the material from a place and at a time
+     individually chosen by them.
+  k. Sui Generis Database Rights means rights other than copyright
+     resulting from Directive 96/9/EC of the European Parliament and of
+     the Council of 11 March 1996 on the legal protection of databases,
+     as amended and/or succeeded, as well as other essentially
+     equivalent rights anywhere in the world.
+  l. You means the individual or entity exercising the Licensed Rights
+     under this Public License. Your has a corresponding meaning.
+Section 2 -- Scope.
+  a. License grant.
+       1. Subject to the terms and conditions of this Public License,
+          the Licensor hereby grants You a worldwide, royalty-free,
+          non-sublicensable, non-exclusive, irrevocable license to
+          exercise the Licensed Rights in the Licensed Material to:
+            a. reproduce and Share the Licensed Material, in whole or
+               in part, for NonCommercial purposes only; and
+            b. produce, reproduce, and Share Adapted Material for
+               NonCommercial purposes only.
+       2. Exceptions and Limitations. For the avoidance of doubt, where
+          Exceptions and Limitations apply to Your use, this Public
+          License does not apply, and You do not need to comply with
+          its terms and conditions.
+       3. Term. The term of this Public License is specified in Section
+          6(a).
+       4. Media and formats; technical modifications allowed. The
+          Licensor authorizes You to exercise the Licensed Rights in
+          all media and formats whether now known or hereafter created,
+          and to make technical modifications necessary to do so. The
+          Licensor waives and/or agrees not to assert any right or
+          authority to forbid You from making technical modifications
+          necessary to exercise the Licensed Rights, including
+          technical modifications necessary to circumvent Effective
+          Technological Measures. For purposes of this Public License,
+          simply making modifications authorized by this Section 2(a)
+          (4) never produces Adapted Material.
+       5. Downstream recipients.
+            a. Offer from the Licensor -- Licensed Material. Every
+               recipient of the Licensed Material automatically
+               receives an offer from the Licensor to exercise the
+               Licensed Rights under the terms and conditions of this
+               Public License.
+            b. No downstream restrictions. You may not offer or impose
+               any additional or different terms or conditions on, or
+               apply any Effective Technological Measures to, the
+               Licensed Material if doing so restricts exercise of the
+               Licensed Rights by any recipient of the Licensed
+               Material.
+       6. No endorsement. Nothing in this Public License constitutes or
+          may be construed as permission to assert or imply that You
+          are, or that Your use of the Licensed Material is, connected
+          with, or sponsored, endorsed, or granted official status by,
+          the Licensor or others designated to receive attribution as
+          provided in Section 3(a)(1)(A)(i).
+  b. Other rights.
+       1. Moral rights, such as the right of integrity, are not
+          licensed under this Public License, nor are publicity,
+          privacy, and/or other similar personality rights; however, to
+          the extent possible, the Licensor waives and/or agrees not to
+          assert any such rights held by the Licensor to the limited
+          extent necessary to allow You to exercise the Licensed
+          Rights, but not otherwise.
+       2. Patent and trademark rights are not licensed under this
+          Public License.
+       3. To the extent possible, the Licensor waives any right to
+          collect royalties from You for the exercise of the Licensed
+          Rights, whether directly or through a collecting society
+          under any voluntary or waivable statutory or compulsory
+          licensing scheme. In all other cases the Licensor expressly
+          reserves any right to collect such royalties, including when
+          the Licensed Material is used other than for NonCommercial
+          purposes.
+Section 3 -- License Conditions.
+Your exercise of the Licensed Rights is expressly made subject to the
+following conditions.
+  a. Attribution.
+       1. If You Share the Licensed Material (including in modified
+          form), You must:
+            a. retain the following if it is supplied by the Licensor
+               with the Licensed Material:
+                 i. identification of the creator(s) of the Licensed
+                    Material and any others designated to receive
+                    attribution, in any reasonable manner requested by
+                    the Licensor (including by pseudonym if
+                    designated);
+                ii. a copyright notice;
+               iii. a notice that refers to this Public License;
+                iv. a notice that refers to the disclaimer of
+                    warranties;
+                 v. a URI or hyperlink to the Licensed Material to the
+                    extent reasonably practicable;
+            b. indicate if You modified the Licensed Material and
+               retain an indication of any previous modifications; and
+            c. indicate the Licensed Material is licensed under this
+               Public License, and include the text of, or the URI or
+               hyperlink to, this Public License.
+       2. You may satisfy the conditions in Section 3(a)(1) in any
+          reasonable manner based on the medium, means, and context in
+          which You Share the Licensed Material. For example, it may be
+          reasonable to satisfy the conditions by providing a URI or
+          hyperlink to a resource that includes the required
+          information.
+       3. If requested by the Licensor, You must remove any of the
+          information required by Section 3(a)(1)(A) to the extent
+          reasonably practicable.
+       4. If You Share Adapted Material You produce, the Adapter's
+          License You apply must not prevent recipients of the Adapted
+          Material from complying with this Public License.
+Section 4 -- Sui Generis Database Rights.
+Where the Licensed Rights include Sui Generis Database Rights that
+apply to Your use of the Licensed Material:
+  a. for the avoidance of doubt, Section 2(a)(1) grants You the right
+     to extract, reuse, reproduce, and Share all or a substantial
+     portion of the contents of the database for NonCommercial purposes
+     only;
+  b. if You include all or a substantial portion of the database
+     contents in a database in which You have Sui Generis Database
+     Rights, then the database in which You have Sui Generis Database
+     Rights (but not its individual contents) is Adapted Material; and
+  c. You must comply with the conditions in Section 3(a) if You Share
+     all or a substantial portion of the contents of the database.
+For the avoidance of doubt, this Section 4 supplements and does not
+replace Your obligations under this Public License where the Licensed
+Rights include other Copyright and Similar Rights.
+Section 5 -- Disclaimer of Warranties and Limitation of Liability.
+  a. UNLESS OTHERWISE SEPARATELY UNDERTAKEN BY THE LICENSOR, TO THE
+     EXTENT POSSIBLE, THE LICENSOR OFFERS THE LICENSED MATERIAL AS-IS
+     AND AS-AVAILABLE, AND MAKES NO REPRESENTATIONS OR WARRANTIES OF
+     ANY KIND CONCERNING THE LICENSED MATERIAL, WHETHER EXPRESS,
+     IMPLIED, STATUTORY, OR OTHER. THIS INCLUDES, WITHOUT LIMITATION,
+     WARRANTIES OF TITLE, MERCHANTABILITY, FITNESS FOR A PARTICULAR
+     PURPOSE, NON-INFRINGEMENT, ABSENCE OF LATENT OR OTHER DEFECTS,
+     ACCURACY, OR THE PRESENCE OR ABSENCE OF ERRORS, WHETHER OR NOT
+     KNOWN OR DISCOVERABLE. WHERE DISCLAIMERS OF WARRANTIES ARE NOT
+     ALLOWED IN FULL OR IN PART, THIS DISCLAIMER MAY NOT APPLY TO YOU.
+  b. TO THE EXTENT POSSIBLE, IN NO EVENT WILL THE LICENSOR BE LIABLE
+     TO YOU ON ANY LEGAL THEORY (INCLUDING, WITHOUT LIMITATION,
+     NEGLIGENCE) OR OTHERWISE FOR ANY DIRECT, SPECIAL, INDIRECT,
+     INCIDENTAL, CONSEQUENTIAL, PUNITIVE, EXEMPLARY, OR OTHER LOSSES,
+     COSTS, EXPENSES, OR DAMAGES ARISING OUT OF THIS PUBLIC LICENSE OR
+     USE OF THE LICENSED MATERIAL, EVEN IF THE LICENSOR HAS BEEN
+     ADVISED OF THE POSSIBILITY OF SUCH LOSSES, COSTS, EXPENSES, OR
+     DAMAGES. WHERE A LIMITATION OF LIABILITY IS NOT ALLOWED IN FULL OR
+     IN PART, THIS LIMITATION MAY NOT APPLY TO YOU.
+  c. The disclaimer of warranties and limitation of liability provided
+     above shall be interpreted in a manner that, to the extent
+     possible, most closely approximates an absolute disclaimer and
+     waiver of all liability.
+Section 6 -- Term and Termination.
+  a. This Public License applies for the term of the Copyright and
+     Similar Rights licensed here. However, if You fail to comply with
+     this Public License, then Your rights under this Public License
+     terminate automatically.
+  b. Where Your right to use the Licensed Material has terminated under
+     Section 6(a), it reinstates:
+       1. automatically as of the date the violation is cured, provided
+          it is cured within 30 days of Your discovery of the
+          violation; or
+       2. upon express reinstatement by the Licensor.
+     For the avoidance of doubt, this Section 6(b) does not affect any
+     right the Licensor may have to seek remedies for Your violations
+     of this Public License.
+  c. For the avoidance of doubt, the Licensor may also offer the
+     Licensed Material under separate terms or conditions or stop
+     distributing the Licensed Material at any time; however, doing so
+     will not terminate this Public License.
+  d. Sections 1, 5, 6, 7, and 8 survive termination of this Public
+     License.
+Section 7 -- Other Terms and Conditions.
+  a. The Licensor shall not be bound by any additional or different
+     terms or conditions communicated by You unless expressly agreed.
+  b. Any arrangements, understandings, or agreements regarding the
+     Licensed Material not stated herein are separate from and
+     independent of the terms and conditions of this Public License.
+Section 8 -- Interpretation.
+  a. For the avoidance of doubt, this Public License does not, and
+     shall not be interpreted to, reduce, limit, restrict, or impose
+     conditions on any use of the Licensed Material that could lawfully
+     be made without permission under this Public License.
+  b. To the extent possible, if any provision of this Public License is
+     deemed unenforceable, it shall be automatically reformed to the
+     minimum extent necessary to make it enforceable. If the provision
+     cannot be reformed, it shall be severed from this Public License
+     without affecting the enforceability of the remaining terms and
+     conditions.
+  c. No term or condition of this Public License will be waived and no
+     failure to comply consented to unless expressly agreed to by the
+     Licensor.
+  d. Nothing in this Public License constitutes or may be interpreted
+     as a limitation upon, or waiver of, any privileges and immunities
+     that apply to the Licensor or You, including from the legal
+     processes of any jurisdiction or authority.
+=======================================================================
+Creative Commons is not a party to its public
+licenses. Notwithstanding, Creative Commons may elect to apply one of
+its public licenses to material it publishes and in those instances
+will be considered the “Licensor.” The text of the Creative Commons
+public licenses is dedicated to the public domain under the CC0 Public
+Domain Dedication. Except for the limited purpose of indicating that
+material is shared under a Creative Commons public license or as
+otherwise permitted by the Creative Commons policies published at
+creativecommons.org/policies, Creative Commons does not authorize the
+use of the trademark "Creative Commons" or any other trademark or logo
+of Creative Commons without its prior written consent including,
+without limitation, in connection with any unauthorized modifications
+to any of its public licenses or any other arrangements,
+understandings, or agreements concerning use of licensed material. For
+the avoidance of doubt, this paragraph does not form part of the
+public licenses.
+Creative Commons may be contacted at creativecommons.org.

README.md CHANGED Viewed

@@ -1,13 +1,196 @@
 ---
-title: Fckngproj
-emoji: 🏃
-colorFrom: red
-colorTo: yellow
 sdk: streamlit
-sdk_version: 1.29.0
 app_file: app.py
-pinned: false
-license: apache-2.0
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: FRN
+emoji: 📉
+colorFrom: gray
+colorTo: red
 sdk: streamlit
+pinned: true
 app_file: app.py
+sdk_version: 1.10.0
+python_version: 3.8
 ---
+# FRN - Full-band Recurrent Network Official Implementation
+**Improving performance of real-time full-band blind packet-loss concealment with predictive network - ICASSP 2023**
+[![Generic badge](https://img.shields.io/badge/arXiv-2211.04071-brightgreen.svg?style=flat-square)](https://arxiv.org/abs/2211.04071)
+[![Generic badge](https://img.shields.io/github/stars/Crystalsound/FRN?color=yellow&label=FRN&logo=github&style=flat-square)](https://github.com/Crystalsound/FRN/)
+[![Generic badge](https://img.shields.io/github/last-commit/Crystalsound/FRN?color=blue&label=last%20commit&style=flat-square)](https://github.com/Crystalsound/FRN/commits)
+## License and citation
+This repository is released under the CC-BY-NC 4.0. license as found in the LICENSE file.
+If you use our software, please cite as below.
+For future queries, please contact [[email protected]](mailto:[email protected]).
+Copyright © 2022 NAMI TECHNOLOGY JSC, Inc. All rights reserved.
+```
+@misc{Nguyen2022ImprovingPO,
+  title={Improving performance of real-time full-band blind packet-loss concealment with predictive network},
+  author={Viet-Anh Nguyen and Anh H. T. Nguyen and Andy W. H. Khong},
+  year={2022},
+  eprint={2211.04071},
+  archivePrefix={arXiv},
+  primaryClass={cs.LG}
+}
+```
+# 1. Results
+Our model achieved a significant gain over baselines. Here, we include the predicted packet loss concealment
+mean-opinion-score (PLCMOS) using Microsoft's [PLCMOS](https://github.com/microsoft/PLC-Challenge/tree/main/PLCMOS)
+service. Please refer to our paper for more benchmarks.
+| Model   | PLCMOS    |
+|---------|-----------|
+| Input   | 3.517     |
+| tPLC    | 3.463     |
+| TFGAN   | 3.645     |
+| **FRN** | **3.655** |
+We also provide several audio samples in [https://crystalsound.github.io/FRN/](https://crystalsound.github.io/FRN/) for
+comparison.
+# 2. Installation
+## Setup
+### Clone the repo
+```
+$ git clone https://github.com/Crystalsound/FRN.git
+$ cd FRN
+```
+### Install dependencies
+* Our implementation requires the `libsndfile` libraries for the Python packages `soundfile`. On Ubuntu, they can be
+  easily installed using `apt-get`:
+    ```
+    $ apt-get update && apt-get install libsndfile-dev
+    ```
+* Create a Python 3.8 environment. Conda is recommended:
+   ```
+   $ conda create -n frn python=3.8
+   $ conda activate frn
+   ```
+* Install the requirements:
+    ```
+    $ pip install -r requirements.txt
+    ```
+# 3. Data preparation
+In our paper, we conduct experiments on the [VCTK](https://datashare.ed.ac.uk/handle/10283/3443) dataset.
+* Download and extract the datasets:
+    ```
+    $ wget http://www.udialogue.org/download/VCTK-Corpus.tar.gz -O data/vctk/VCTK-Corpus.tar.gz
+    $ tar -zxvf data/vctk/VCTK-Corpus.tar.gz -C data/vctk/ --strip-components=1
+    ```
+  After extracting the datasets, your `./data` directory should look like this:
+    ```
+    .
+    |--data
+        |--vctk
+            |--wav48
+                |--p225
+                    |--p225_001.wav
+                    ...
+            |--train.txt
+            |--test.txt
+    ```
+* In order to load the datasets, text files that contain training and testing audio paths are required. We have
+  prepared `train.txt` and `test.txt` files in `./data/vctk` directory.
+# 4. Run the code
+## Configuration
+`config.py` is the most important file. Here, you can find all the configurations related to experiment setups,
+datasets, models, training, testing, etc. Although the config file has been explained thoroughly, we recommend reading
+our paper to fully understand each parameter.
+## Training
+* Adjust training hyperparameters in `config.py`. We provide the pretrained predictor in `lightning_logs/predictor` as stated in our paper. The FRN model can be trained entirely from scratch and will work as well. In this case, initiate `PLCModel(..., pred_ckpt_path=None)`.
+* Run `main.py`:
+    ```
+    $ python main.py --mode train
+    ```
+* Each run will create a version in `./lightning_logs`, where the model checkpoint and hyperparameters are saved. In
+  case you want to continue training from one of these versions, just set the argument `--version` of the above command
+  to your desired version number. For example:
+    ```
+    # resume from version 0
+    $ python main.py --mode train --version 0
+    ```
+* To monitor the training curves as well as inspect model output visualization, run the tensorboard:
+    ```
+    $ tensorboard --logdir=./lightning_logs --bind_all
+    ```
+  ![image.png](https://images.viblo.asia/eb2246f9-2747-43b9-8f78-d6c154144716.png)
+## Evaluation
+In our paper, we evaluated with 2 masking methods: simulation using Markov Chain and employing real traces in PLC
+Challenge.
+* Get the blind test set with loss traces:
+    ```
+    $ wget http://plcchallenge2022pub.blob.core.windows.net/plcchallengearchive/blind.tar.gz
+    $ tar -xvf blind.tar.gz -C test_samples
+    ```
+* Modify `config.py` to change evaluation setup if necessary.
+* Run `main.py` with a version number to be evaluated:
+    ```
+    $ python main.py --mode eval --version 0
+    ```
+  During the evaluation, several output samples are saved to `CONFIG.LOG.sample_path` for sanity testing.
+## Configure a new dataset
+Our implementation currently works with the VCTK dataset but can be easily extensible to a new one.
+* Firstly, you need to prepare `train.txt` and `test.txt`. See `./data/vctk/train.txt` and `./data/vctk/test.txt` for
+  example.
+* Secondly, add a new dictionary to `CONFIG.DATA.data_dir`:
+    ```
+    {
+    'root': 'path/to/data/directory',
+    'train': 'path/to/train.txt',
+    'test': 'path/to/test.txt'
+    }
+    ```
+  **Important:** Make sure each line in `train.txt` and `test.txt` joining with `'root'` is a valid path to its
+  corresponding audio file.
+# 5. Audio generation
+* In order to generate output audios, you need to modify `CONFIG.TEST.in_dir` to your input directory.
+* Run `main.py`:
+    ```
+    python main.py --mode test --version 0
+    ```
+  The generated audios are saved to `CONFIG.TEST.out_dir`.
+  ## ONNX inferencing
+  We provide ONNX inferencing scripts and the best ONNX model (converted from the best checkpoint)
+  at `lightning_logs/best_model.onnx`.
+    * Convert a checkpoint to an ONNX model:
+        ```
+        python main.py --mode onnx --version 0
+        ```
+      The converted ONNX model will be saved to `lightning_logs/version_0/checkpoints`.
+    * Put test audios in `test_samples` and inference with the converted ONNX model (see `inference_onnx.py` for more
+      details):
+         ```
+        python inference_onnx.py --onnx_path lightning_logs/version_0/frn.onnx
+        ```

app (1).py ADDED Viewed

	@@ -0,0 +1,122 @@

+import streamlit as st
+import librosa
+import soundfile as sf
+import librosa.display
+from config import CONFIG
+import torch
+from dataset import MaskGenerator
+import onnxruntime, onnx
+import matplotlib.pyplot as plt
+import numpy as np
+from matplotlib.backends.backend_agg import FigureCanvasAgg as FigureCanvas
+@st.cache
+def load_model():
+    path = 'lightning_logs/version_0/checkpoints/frn.onnx'
+    onnx_model = onnx.load(path)
+    options = onnxruntime.SessionOptions()
+    options.intra_op_num_threads = 2
+    options.graph_optimization_level = onnxruntime.GraphOptimizationLevel.ORT_ENABLE_ALL
+    session = onnxruntime.InferenceSession(path, options)
+    input_names = [x.name for x in session.get_inputs()]
+    output_names = [x.name for x in session.get_outputs()]
+    return session, onnx_model, input_names, output_names
+def inference(re_im, session, onnx_model, input_names, output_names):
+    inputs = {input_names[i]: np.zeros([d.dim_value for d in _input.type.tensor_type.shape.dim],
+                                       dtype=np.float32)
+              for i, _input in enumerate(onnx_model.graph.input)
+              }
+    output_audio = []
+    for t in range(re_im.shape[0]):
+        inputs[input_names[0]] = re_im[t]
+        out, prev_mag, predictor_state, mlp_state = session.run(output_names, inputs)
+        inputs[input_names[1]] = prev_mag
+        inputs[input_names[2]] = predictor_state
+        inputs[input_names[3]] = mlp_state
+        output_audio.append(out)
+    output_audio = torch.tensor(np.concatenate(output_audio, 0))
+    output_audio = output_audio.permute(1, 0, 2).contiguous()
+    output_audio = torch.view_as_complex(output_audio)
+    output_audio = torch.istft(output_audio, window, stride, window=hann)
+    return output_audio.numpy()
+def visualize(hr, lr, recon):
+    sr = CONFIG.DATA.sr
+    window_size = 1024
+    window = np.hanning(window_size)
+    stft_hr = librosa.core.spectrum.stft(hr, n_fft=window_size, hop_length=512, window=window)
+    stft_hr = 2 * np.abs(stft_hr) / np.sum(window)
+    stft_lr = librosa.core.spectrum.stft(lr, n_fft=window_size, hop_length=512, window=window)
+    stft_lr = 2 * np.abs(stft_lr) / np.sum(window)
+    stft_recon = librosa.core.spectrum.stft(recon, n_fft=window_size, hop_length=512, window=window)
+    stft_recon = 2 * np.abs(stft_recon) / np.sum(window)
+    fig, (ax1, ax2, ax3) = plt.subplots(3, 1, sharey=True, sharex=True, figsize=(16, 10))
+    ax1.title.set_text('Target signal')
+    ax2.title.set_text('Lossy signal')
+    ax3.title.set_text('Enhanced signal')
+    canvas = FigureCanvas(fig)
+    p = librosa.display.specshow(librosa.amplitude_to_db(stft_hr), ax=ax1, y_axis='linear', x_axis='time', sr=sr)
+    p = librosa.display.specshow(librosa.amplitude_to_db(stft_lr), ax=ax2, y_axis='linear', x_axis='time', sr=sr)
+    p = librosa.display.specshow(librosa.amplitude_to_db(stft_recon), ax=ax3, y_axis='linear', x_axis='time', sr=sr)
+    return fig
+packet_size = CONFIG.DATA.EVAL.packet_size
+window = CONFIG.DATA.window_size
+stride = CONFIG.DATA.stride
+title = 'Packet Loss Concealment'
+st.set_page_config(page_title=title, page_icon=":sound:")
+st.title(title)
+st.subheader('Upload audio')
+uploaded_file = st.file_uploader("Upload your audio file (.wav) at 48 kHz sampling rate")
+is_file_uploaded = uploaded_file is not None
+if not is_file_uploaded:
+    uploaded_file = 'sample.wav'
+target, sr = librosa.load(uploaded_file, sr=48000)
+target = target[:packet_size * (len(target) // packet_size)]
+st.text('Audio sample')
+st.audio(uploaded_file)
+st.subheader('Choose expected packet loss rate')
+slider = [st.slider("Expected loss rate for Markov Chain loss generator", 0, 100, step=1)]
+loss_percent = float(slider[0])/100
+mask_gen = MaskGenerator(is_train=False, probs=[(1 - loss_percent, loss_percent)])
+lossy_input = target.copy().reshape(-1, packet_size)
+mask = mask_gen.gen_mask(len(lossy_input), seed=0)[:, np.newaxis]
+lossy_input *= mask
+lossy_input = lossy_input.reshape(-1)
+hann = torch.sqrt(torch.hann_window(window))
+lossy_input_tensor = torch.tensor(lossy_input)
+re_im = torch.stft(lossy_input_tensor, window, stride, window=hann, return_complex=False).permute(1, 0, 2).unsqueeze(
+    1).numpy().astype(np.float32)
+session, onnx_model, input_names, output_names = load_model()
+if st.button('Conceal lossy audio!'):
+    with st.spinner('Please wait for completion'):
+        output = inference(re_im, session, onnx_model, input_names, output_names)
+        st.subheader('Visualization')
+        fig = visualize(target, lossy_input, output)
+        st.pyplot(fig)
+    st.success('Done!')
+    sf.write('target.wav', target, sr)
+    sf.write('lossy.wav', lossy_input, sr)
+    sf.write('enhanced.wav', output, sr)
+    st.text('Original audio')
+    st.audio('target.wav')
+    st.text('Lossy audio')
+    st.audio('lossy.wav')
+    st.text('Enhanced audio')
+    st.audio('enhanced.wav')

config.py ADDED Viewed

	@@ -0,0 +1,59 @@

+class CONFIG:
+    gpus = "0,1"  # List of gpu devices
+    class TRAIN:
+        batch_size = 90  # number of audio files per batch
+        lr = 1e-4  # learning rate
+        epochs = 150  # max training epochs
+        workers = 12  # number of dataloader workers
+        val_split = 0.1  # validation set proportion
+        clipping_val = 1.0  # gradient clipping value
+        patience = 3  # learning rate scheduler's patience
+        factor = 0.5  # learning rate reduction factor
+    # Model config
+    class MODEL:
+        enc_layers = 4  # number of MLP blocks in the encoder
+        enc_in_dim = 384  # dimension of the input projection layer in the encoder
+        enc_dim = 768  # dimension of the MLP blocks
+        pred_dim = 512  # dimension of the LSTM in the predictor
+        pred_layers = 1  # number of LSTM layers in the predictor
+    # Dataset config
+    class DATA:
+        dataset = 'vctk'  # dataset to use
+        '''
+        Dictionary that specifies paths to root directories and train/test text files of each datasets.
+        'root' is the path to the dataset and each line of the train.txt/test.txt files should contains the path to an
+        audio file from 'root'.
+        '''
+        data_dir = {'vctk': {'root': 'data/vctk/wav48',
+                             'train': "data/vctk/train.txt",
+                             'test': "data/vctk/test.txt"},
+                    }
+        assert dataset in data_dir.keys(), 'Unknown dataset.'
+        sr = 48000  # audio sampling rate
+        audio_chunk_len = 122880  # size of chunk taken in each audio files
+        window_size = 960  # window size of the STFT operation, equivalent to packet size
+        stride = 480  # stride of the STFT operation
+        class TRAIN:
+            packet_sizes = [256, 512, 768, 960, 1024,
+                            1536]  # packet sizes for training. All sizes should be divisible by 'audio_chunk_len'
+            transition_probs = ((0.9, 0.1), (0.5, 0.1), (0.5, 0.5))  # list of trainsition probs for Markow Chain
+        class EVAL:
+            packet_size = 960  # 20ms
+            transition_probs = [(0.9, 0.1)]  # (0.9, 0.1) ~ 10%; (0.8, 0.2) ~ 20%; (0.6, 0.4) ~ 40%
+            masking = 'gen'  # whether using simulation or real traces from Microsoft to generate masks
+            assert masking in ['gen', 'real']
+            trace_path = 'test_samples/blind/lossy_singals'  # must be clarified if masking = 'real'
+    class LOG:
+        log_dir = 'lightning_logs'  # checkpoint and log directory
+        sample_path = 'audio_samples'  # path to save generated audio samples in evaluation.
+    class TEST:
+        in_dir = 'test_samples/blind/lossy_signals'  # path to test audio inputs
+        out_dir = 'test_samples/blind/lossy_signals_out'  # path to generated outputs

dataset.py ADDED Viewed

	@@ -0,0 +1,224 @@

+import glob
+import os
+import random
+import librosa
+import numpy as np
+import soundfile as sf
+import torch
+from numpy.random import default_rng
+from pydtmc import MarkovChain
+from sklearn.model_selection import train_test_split
+from torch.utils.data import Dataset
+from config import CONFIG
+np.random.seed(0)
+rng = default_rng()
+def load_audio(
+        path,
+        sample_rate: int = 16000,
+        chunk_len=None,
+):
+    with sf.SoundFile(path) as f:
+        sr = f.samplerate
+        audio_len = f.frames
+        if chunk_len is not None and chunk_len < audio_len:
+            start_index = torch.randint(0, audio_len - chunk_len, (1,))[0]
+            frames = f._prepare_read(start_index, start_index + chunk_len, -1)
+            audio = f.read(frames, always_2d=True, dtype="float32")
+        else:
+            audio = f.read(always_2d=True, dtype="float32")
+    if sr != sample_rate:
+        audio = librosa.resample(np.squeeze(audio), sr, sample_rate)[:, np.newaxis]
+    return audio.T
+def pad(sig, length):
+    if sig.shape[1] < length:
+        pad_len = length - sig.shape[1]
+        sig = torch.hstack((sig, torch.zeros((sig.shape[0], pad_len))))
+    else:
+        start = random.randint(0, sig.shape[1] - length)
+        sig = sig[:, start:start + length]
+    return sig
+class MaskGenerator:
+    def __init__(self, is_train=True, probs=((0.9, 0.1), (0.5, 0.1), (0.5, 0.5))):
+        '''
+            is_train: if True, mask generator for training otherwise for evaluation
+            probs: a list of transition probability (p_N, p_L) for Markov Chain. Only allow 1 tuple if 'is_train=False'
+        '''
+        self.is_train = is_train
+        self.probs = probs
+        self.mcs = []
+        if self.is_train:
+            for prob in probs:
+                self.mcs.append(MarkovChain([[prob[0], 1 - prob[0]], [1 - prob[1], prob[1]]], ['1', '0']))
+        else:
+            assert len(probs) == 1
+            prob = self.probs[0]
+            self.mcs.append(MarkovChain([[prob[0], 1 - prob[0]], [1 - prob[1], prob[1]]], ['1', '0']))
+    def gen_mask(self, length, seed=0):
+        if self.is_train:
+            mc = random.choice(self.mcs)
+        else:
+            mc = self.mcs[0]
+        mask = mc.walk(length - 1, seed=seed)
+        mask = np.array(list(map(int, mask)))
+        return mask
+class TestLoader(Dataset):
+    def __init__(self):
+        dataset_name = CONFIG.DATA.dataset
+        self.mask = CONFIG.DATA.EVAL.masking
+        self.target_root = CONFIG.DATA.data_dir[dataset_name]['root']
+        txt_list = CONFIG.DATA.data_dir[dataset_name]['test']
+        self.data_list = self.load_txt(txt_list)
+        if self.mask == 'real':
+            trace_txt = glob.glob(os.path.join(CONFIG.DATA.EVAL.trace_path, '*.txt'))
+            trace_txt.sort()
+            self.trace_list = [1 - np.array(list(map(int, open(txt, 'r').read().strip('\n').split('\n')))) for txt in
+                               trace_txt]
+        else:
+            self.mask_generator = MaskGenerator(is_train=False, probs=CONFIG.DATA.EVAL.transition_probs)
+        self.sr = CONFIG.DATA.sr
+        self.stride = CONFIG.DATA.stride
+        self.window_size = CONFIG.DATA.window_size
+        self.audio_chunk_len = CONFIG.DATA.audio_chunk_len
+        self.p_size = CONFIG.DATA.EVAL.packet_size  # 20ms
+        self.hann = torch.sqrt(torch.hann_window(self.window_size))
+    def __len__(self):
+        return len(self.data_list)
+    def load_txt(self, txt_list):
+        target = []
+        with open(txt_list) as f:
+            for line in f:
+                target.append(os.path.join(self.target_root, line.strip('\n')))
+        target = list(set(target))
+        target.sort()
+        return target
+    def __getitem__(self, index):
+        target = load_audio(self.data_list[index], sample_rate=self.sr)
+        target = target[:, :(target.shape[1] // self.p_size) * self.p_size]
+        sig = np.reshape(target, (-1, self.p_size)).copy()
+        if self.mask == 'real':
+            mask = self.trace_list[index % len(self.trace_list)]
+            mask = np.repeat(mask, np.ceil(len(sig) / len(mask)), 0)[:len(sig)][:, np.newaxis]
+        else:
+            mask = self.mask_generator.gen_mask(len(sig), seed=index)[:, np.newaxis]
+        sig *= mask
+        sig = torch.tensor(sig).reshape(-1)
+        target = torch.tensor(target).squeeze(0)
+        sig_wav = sig.clone()
+        target_wav = target.clone()
+        target = torch.stft(target, self.window_size, self.stride, window=self.hann,
+                            return_complex=False).permute(2, 0, 1)
+        sig = torch.stft(sig, self.window_size, self.stride, window=self.hann, return_complex=False).permute(2, 0, 1)
+        return sig.float(), target.float(), sig_wav, target_wav
+class BlindTestLoader(Dataset):
+    def __init__(self, test_dir):
+        self.data_list = glob.glob(os.path.join(test_dir, '*.wav'))
+        self.sr = CONFIG.DATA.sr
+        self.stride = CONFIG.DATA.stride
+        self.chunk_len = CONFIG.DATA.window_size
+        self.hann = torch.sqrt(torch.hann_window(self.chunk_len))
+    def __len__(self):
+        return len(self.data_list)
+    def __getitem__(self, index):
+        sig = load_audio(self.data_list[index], sample_rate=self.sr)
+        sig = torch.from_numpy(sig).squeeze(0)
+        sig = torch.stft(sig, self.chunk_len, self.stride, window=self.hann, return_complex=False).permute(2, 0, 1)
+        return sig.float()
+class TrainDataset(Dataset):
+    def __init__(self, mode='train'):
+        dataset_name = CONFIG.DATA.dataset
+        self.target_root = CONFIG.DATA.data_dir[dataset_name]['root']
+        txt_list = CONFIG.DATA.data_dir[dataset_name]['train']
+        self.data_list = self.load_txt(txt_list)
+        if mode == 'train':
+            self.data_list, _ = train_test_split(self.data_list, test_size=CONFIG.TRAIN.val_split, random_state=0)
+        elif mode == 'val':
+            _, self.data_list = train_test_split(self.data_list, test_size=CONFIG.TRAIN.val_split, random_state=0)
+        self.p_sizes = CONFIG.DATA.TRAIN.packet_sizes
+        self.mode = mode
+        self.sr = CONFIG.DATA.sr
+        self.window = CONFIG.DATA.audio_chunk_len
+        self.stride = CONFIG.DATA.stride
+        self.chunk_len = CONFIG.DATA.window_size
+        self.hann = torch.sqrt(torch.hann_window(self.chunk_len))
+        self.mask_generator = MaskGenerator(is_train=True, probs=CONFIG.DATA.TRAIN.transition_probs)
+    def __len__(self):
+        return len(self.data_list)
+    def load_txt(self, txt_list):
+        target = []
+        with open(txt_list) as f:
+            for line in f:
+                target.append(os.path.join(self.target_root, line.strip('\n')))
+        target = list(set(target))
+        target.sort()
+        return target
+    def fetch_audio(self, index):
+        sig = load_audio(self.data_list[index], sample_rate=self.sr, chunk_len=self.window)
+        while sig.shape[1] < self.window:
+            idx = torch.randint(0, len(self.data_list), (1,))[0]
+            pad_len = self.window - sig.shape[1]
+            if pad_len < 0.02 * self.sr:
+                padding = np.zeros((1, pad_len), dtype=np.float)
+            else:
+                padding = load_audio(self.data_list[idx], sample_rate=self.sr, chunk_len=pad_len)
+            sig = np.hstack((sig, padding))
+        return sig
+    def __getitem__(self, index):
+        sig = self.fetch_audio(index)
+        sig = sig.reshape(-1).astype(np.float32)
+        target = torch.tensor(sig.copy())
+        p_size = random.choice(self.p_sizes)
+        sig = np.reshape(sig, (-1, p_size))
+        mask = self.mask_generator.gen_mask(len(sig), seed=index)[:, np.newaxis]
+        sig *= mask
+        sig = torch.tensor(sig.copy()).reshape(-1)
+        target = torch.stft(target, self.chunk_len, self.stride, window=self.hann,
+                            return_complex=False).permute(2, 0, 1).float()
+        sig = torch.stft(sig, self.chunk_len, self.stride, window=self.hann, return_complex=False)
+        sig = sig.permute(2, 0, 1).float()
+        return sig, target

gitattributes ADDED Viewed

	@@ -0,0 +1,5 @@

+lightning_logs/version_0/checkpoints/frn-epoch=65-val_loss=0.2290.ckpt filter=lfs diff=lfs merge=lfs -text
+lightning_logs/version_0/checkpoints/frn.onnx filter=lfs diff=lfs merge=lfs -text
+lightning_logs/predictor/checkpoints/predictor.ckpt filter=lfs diff=lfs merge=lfs -text
+*.onnx filter=lfs diff=lfs merge=lfs -text
+*.ckpt filter=lfs diff=lfs merge=lfs -text

index.html ADDED Viewed

	@@ -0,0 +1,139 @@

+<!DOCTYPE html>
+<html>
+<head>
+    <link href="css/styles.css" rel="stylesheet">
+    <title>Full-band Recurrent Network</title>
+</head>
+<body>
+<nav>
+    <ul>
+        <!--        <li><a href="/">Home</a></li> -->
+        <li><a href="https://github.com/Crystalsound/FRN/">Github</a></li>
+        <li><a href="https://arxiv.org/abs/2211.04071">Arxiv</a></li>
+        <li><a href="https://www.namitech.io/">Website</a></li>
+    </ul>
+</nav>
+<div class=”container”>
+    <div class=”blurb”>
+        <h1>Audio samples</h1>
+        <p><b>Improving performance of real-time full-band blind packet-loss concealment with predictive network</b></a>
+        </p>
+        <p><i>Viet-Anh Nguyen<sup>1</sup>, Anh H. T. Nguyen<sup>1</sup>, and Andy W. H. Khong<sup>2</sup></i>
+            <br><sup>1</sup>Crystalsound Team, NamiTech JSC, Ho Chi Minh City, Vietnam
+            <br><sup>2</sup>Nanyang Technological University, Singapore
+            <br><TT>{vietanh.nguyen, anh.nguyen}@namitech.io, [email protected]
+    </div>
+</div>
+<h3> Audio samples of our full-band recurrent network (FRN) versus TFGAN and tPLCNet for blind packet loss concealment
+    (PLC)</h3>
+Audio files are at 48 kHz sampling rate with packet size of 20 ms. Our FRN is a causal and blind PLC model while TFGAN
+is non-causal and tPLC is an informed PLC model.
+<br> </br>
+<table>
+    <thead>
+    <tr>
+        <th align="middle">Clean target</th>
+        <th align="middle">Lossy input</th>
+        <th align="middle">TFGAN</th>
+        <th align="middle">tPLCNet</th>
+        <th align="middle">FRN (Ours)</th>
+    </tr>
+    </thead>
+    <tbody>
+    <tr>
+        <td>
+            <audio controls style="width: 250px; height: 50px">
+                <source src="audio_samples/sample_1/clean.wav" type="audio/wav">
+            </audio>
+        </td>
+        <td>
+            <audio controls style="width: 250px; height: 50px">
+                <source src="audio_samples/sample_1/lossy.wav" type="audio/wav">
+            </audio>
+        </td>
+        <td>
+            <audio controls style="width: 250px; height: 50px">
+                <source src="audio_samples/sample_1/TFGAN_enhanced.wav" type="audio/wav">
+            </audio>
+        </td>
+        <td>
+            <audio controls style="width: 250px; height: 50px">
+                <source src="audio_samples/sample_1/tPLC_enhanced.wav" type="audio/wav">
+            </audio>
+        </td>
+        <td>
+            <audio controls style="width: 250px; height: 50px">
+                <source src="audio_samples/sample_1/FRN_enhanced.wav" type="audio/wav">
+            </audio>
+        </td>
+    </tr>
+    <tr>
+        <td>
+            <audio controls style="width: 250px; height: 50px">
+                <source src="audio_samples/sample_2/clean.wav" type="audio/wav">
+            </audio>
+        </td>
+        <td>
+            <audio controls style="width: 250px; height: 50px">
+                <source src="audio_samples/sample_2/lossy.wav" type="audio/wav">
+            </audio>
+        </td>
+        <td>
+            <audio controls style="width: 250px; height: 50px">
+                <source src="audio_samples/sample_2/TFGAN_enhanced.wav" type="audio/wav">
+            </audio>
+        </td>
+        <td>
+            <audio controls style="width: 250px; height: 50px">
+                <source src="audio_samples/sample_2/tPLC_enhanced.wav" type="audio/wav">
+            </audio>
+        </td>
+        <td>
+            <audio controls style="width: 250px; height: 50px">
+                <source src="audio_samples/sample_2/FRN_enhanced.wav" type="audio/wav">
+            </audio>
+        </td>
+    </tr>
+    <tr>
+        <td>
+            <audio controls style="width: 250px; height: 50px">
+                <source src="audio_samples/sample_3/clean.wav" type="audio/wav">
+            </audio>
+        </td>
+        <td>
+            <audio controls style="width: 250px; height: 50px">
+                <source src="audio_samples/sample_3/lossy.wav" type="audio/wav">
+            </audio>
+        </td>
+        <td>
+            <audio controls style="width: 250px; height: 50px">
+                <source src="audio_samples/sample_3/TFGAN_enhanced.wav" type="audio/wav">
+            </audio>
+        </td>
+        <td>
+            <audio controls style="width: 250px; height: 50px">
+                <source src="audio_samples/sample_3/tPLC_enhanced.wav" type="audio/wav">
+            </audio>
+        </td>
+        <td>
+            <audio controls style="width: 250px; height: 50px">
+                <source src="audio_samples/sample_3/FRN_enhanced.wav" type="audio/wav">
+            </audio>
+        </td>
+    </tr>
+    </tbody>
+</table>
+<!--    <footer>
+   <ul>
+     <li><a href=”mailto:YOUREMAIL”>YOUREMAIL</a></li>
+   </ul>
+   </footer> -->
+</body>
+</html>

inference_onnx.py ADDED Viewed

	@@ -0,0 +1,63 @@

+import argparse
+import glob
+import os
+import librosa
+import numpy as np
+import onnx
+import onnxruntime
+import soundfile as sf
+import torch
+import tqdm
+from config import CONFIG
+parser = argparse.ArgumentParser()
+parser.add_argument('--onnx_path', default=None,
+                    help='path to onnx')
+args = parser.parse_args()
+if __name__ == '__main__':
+    path = args.onnx_path
+    window = CONFIG.DATA.window_size
+    stride = CONFIG.DATA.stride
+    onnx_model = onnx.load(path)
+    options = onnxruntime.SessionOptions()
+    options.intra_op_num_threads = 8
+    options.graph_optimization_level = onnxruntime.GraphOptimizationLevel.ORT_ENABLE_ALL
+    session = onnxruntime.InferenceSession(path, options)
+    input_names = [x.name for x in session.get_inputs()]
+    output_names = [x.name for x in session.get_outputs()]
+    print(input_names)
+    print(output_names)
+    audio_files = glob.glob(os.path.join(CONFIG.TEST.in_dir, '*.wav'))
+    hann = torch.sqrt(torch.hann_window(window))
+    os.makedirs(CONFIG.TEST.out_dir, exist_ok=True)
+    for file in tqdm.tqdm(audio_files, total=len(audio_files)):
+        sig, _ = librosa.load(file, sr=48000)
+        sig = torch.tensor(sig)
+        re_im = torch.stft(sig, window, stride, window=hann, return_complex=False).permute(1, 0, 2).unsqueeze(
+    1).numpy().astype(np.float32)
+        inputs = {input_names[i]: np.zeros([d.dim_value for d in _input.type.tensor_type.shape.dim],
+                                           dtype=np.float32)
+                  for i, _input in enumerate(onnx_model.graph.input)
+                  }
+        output_audio = []
+        for t in range(re_im.shape[0]):
+            inputs[input_names[0]] = re_im[t]
+            out, prev_mag, predictor_state, mlp_state = session.run(output_names, inputs)
+            inputs[input_names[1]] = prev_mag
+            inputs[input_names[2]] = predictor_state
+            inputs[input_names[3]] = mlp_state
+            output_audio.append(out)
+        output_audio = torch.tensor(np.concatenate(output_audio, 0))
+        output_audio = output_audio.permute(1, 0, 2).contiguous()
+        output_audio = torch.view_as_complex(output_audio)
+        output_audio = torch.istft(output_audio, window, stride, window=hann)
+        sf.write(os.path.join(CONFIG.TEST.out_dir, os.path.basename(file)), output_audio, samplerate=48000,
+                 subtype='PCM_16')

loss.py ADDED Viewed

	@@ -0,0 +1,145 @@

+import librosa
+import pytorch_lightning as pl
+import torch
+from auraloss.freq import STFTLoss, MultiResolutionSTFTLoss, apply_reduction, SpectralConvergenceLoss, STFTMagnitudeLoss
+from config import CONFIG
+class STFTLossDDP(STFTLoss):
+    def __init__(self,
+                 fft_size=1024,
+                 hop_size=256,
+                 win_length=1024,
+                 window="hann_window",
+                 w_sc=1.0,
+                 w_log_mag=1.0,
+                 w_lin_mag=0.0,
+                 w_phs=0.0,
+                 sample_rate=None,
+                 scale=None,
+                 n_bins=None,
+                 scale_invariance=False,
+                 eps=1e-8,
+                 output="loss",
+                 reduction="mean",
+                 device=None):
+        super(STFTLoss, self).__init__()
+        self.fft_size = fft_size
+        self.hop_size = hop_size
+        self.win_length = win_length
+        self.window = getattr(torch, window)(win_length)
+        self.w_sc = w_sc
+        self.w_log_mag = w_log_mag
+        self.w_lin_mag = w_lin_mag
+        self.w_phs = w_phs
+        self.sample_rate = sample_rate
+        self.scale = scale
+        self.n_bins = n_bins
+        self.scale_invariance = scale_invariance
+        self.eps = eps
+        self.output = output
+        self.reduction = reduction
+        self.device = device
+        self.spectralconv = SpectralConvergenceLoss()
+        self.logstft = STFTMagnitudeLoss(log=True, reduction=reduction)
+        self.linstft = STFTMagnitudeLoss(log=False, reduction=reduction)
+        # setup mel filterbank
+        if self.scale == "mel":
+            assert (sample_rate is not None)  # Must set sample rate to use mel scale
+            assert (n_bins <= fft_size)  # Must be more FFT bins than Mel bins
+            fb = librosa.filters.mel(sample_rate, fft_size, n_mels=n_bins)
+            self.fb = torch.tensor(fb).unsqueeze(0)
+        elif self.scale == "chroma":
+            assert (sample_rate is not None)  # Must set sample rate to use chroma scale
+            assert (n_bins <= fft_size)  # Must be more FFT bins than chroma bins
+            fb = librosa.filters.chroma(sample_rate, fft_size, n_chroma=n_bins)
+            self.fb = torch.tensor(fb).unsqueeze(0)
+        if scale is not None and device is not None:
+            self.fb = self.fb.to(self.device)  # move filterbank to device
+    def compressed_loss(self, x, y, alpha=None):
+        self.window = self.window.to(x.device)
+        x_mag, x_phs = self.stft(x.view(-1, x.size(-1)))
+        y_mag, y_phs = self.stft(y.view(-1, y.size(-1)))
+        if alpha is not None:
+            x_mag = x_mag ** alpha
+            y_mag = y_mag ** alpha
+        # apply relevant transforms
+        if self.scale is not None:
+            x_mag = torch.matmul(self.fb.to(x_mag.device), x_mag)
+            y_mag = torch.matmul(self.fb.to(y_mag.device), y_mag)
+        # normalize scales
+        if self.scale_invariance:
+            alpha = (x_mag * y_mag).sum([-2, -1]) / ((y_mag ** 2).sum([-2, -1]))
+            y_mag = y_mag * alpha.unsqueeze(-1)
+        # compute loss terms
+        sc_loss = self.spectralconv(x_mag, y_mag) if self.w_sc else 0.0
+        mag_loss = self.logstft(x_mag, y_mag) if self.w_log_mag else 0.0
+        lin_loss = self.linstft(x_mag, y_mag) if self.w_lin_mag else 0.0
+        # combine loss terms
+        loss = (self.w_sc * sc_loss) + (self.w_log_mag * mag_loss) + (self.w_lin_mag * lin_loss)
+        loss = apply_reduction(loss, reduction=self.reduction)
+        return loss
+    def forward(self, x, y):
+        return self.compressed_loss(x, y, 0.3)
+class MRSTFTLossDDP(MultiResolutionSTFTLoss):
+    def __init__(self,
+                 fft_sizes=(1024, 2048, 512),
+                 hop_sizes=(120, 240, 50),
+                 win_lengths=(600, 1200, 240),
+                 window="hann_window",
+                 w_sc=1.0,
+                 w_log_mag=1.0,
+                 w_lin_mag=0.0,
+                 w_phs=0.0,
+                 sample_rate=None,
+                 scale=None,
+                 n_bins=None,
+                 scale_invariance=False,
+                 **kwargs):
+        super(MultiResolutionSTFTLoss, self).__init__()
+        assert len(fft_sizes) == len(hop_sizes) == len(win_lengths)  # must define all
+        self.stft_losses = torch.nn.ModuleList()
+        for fs, ss, wl in zip(fft_sizes, hop_sizes, win_lengths):
+            self.stft_losses += [STFTLossDDP(fs,
+                                             ss,
+                                             wl,
+                                             window,
+                                             w_sc,
+                                             w_log_mag,
+                                             w_lin_mag,
+                                             w_phs,
+                                             sample_rate,
+                                             scale,
+                                             n_bins,
+                                             scale_invariance,
+                                             **kwargs)]
+class Loss(pl.LightningModule):
+    def __init__(self):
+        super(Loss, self).__init__()
+        self.stft_loss = MRSTFTLossDDP(sample_rate=CONFIG.DATA.sr, device="cpu", w_log_mag=0.0, w_lin_mag=1.0)
+        self.window = torch.sqrt(torch.hann_window(CONFIG.DATA.window_size))
+    def forward(self, x, y):
+        x = x.permute(0, 2, 3, 1)
+        y = y.permute(0, 2, 3, 1)
+        wave_x = torch.istft(torch.view_as_complex(x.contiguous()), CONFIG.DATA.window_size, CONFIG.DATA.stride,
+                             window=self.window.to(x.device))
+        wave_y = torch.istft(torch.view_as_complex(y.contiguous()), CONFIG.DATA.window_size, CONFIG.DATA.stride,
+                             window=self.window.to(y.device))
+        loss = self.stft_loss(wave_x, wave_y)
+        return loss

main.py ADDED Viewed

	@@ -0,0 +1,131 @@

+import argparse
+import os
+import pytorch_lightning as pl
+import soundfile as sf
+import torch
+from pytorch_lightning.callbacks import ModelCheckpoint
+from pytorch_lightning.utilities.model_summary import summarize
+from torch.utils.data import DataLoader
+from config import CONFIG
+from dataset import TrainDataset, TestLoader, BlindTestLoader
+from models.frn import PLCModel, OnnxWrapper
+from utils.tblogger import TensorBoardLoggerExpanded
+from utils.utils import mkdir_p
+parser = argparse.ArgumentParser()
+parser.add_argument('--version', default=None,
+                    help='version to resume')
+parser.add_argument('--mode', default='train',
+                    help='training or testing mode')
+args = parser.parse_args()
+os.environ["CUDA_VISIBLE_DEVICES"] = str(CONFIG.gpus)
+assert args.mode in ['train', 'eval', 'test', 'onnx'], "--mode should be 'train', 'eval', 'test' or 'onnx'"
+def resume(train_dataset, val_dataset, version):
+    print("Version", version)
+    model_path = os.path.join(CONFIG.LOG.log_dir, 'version_{}/checkpoints/'.format(str(version)))
+    config_path = os.path.join(CONFIG.LOG.log_dir, 'version_{}/'.format(str(version)) + 'hparams.yaml')
+    model_name = [x for x in os.listdir(model_path) if x.endswith(".ckpt")][0]
+    ckpt_path = model_path + model_name
+    checkpoint = PLCModel.load_from_checkpoint(ckpt_path,
+                                               strict=True,
+                                               hparams_file=config_path,
+                                               train_dataset=train_dataset,
+                                               val_dataset=val_dataset,
+                                               window_size=CONFIG.DATA.window_size)
+    return checkpoint
+def train():
+    train_dataset = TrainDataset('train')
+    val_dataset = TrainDataset('val')
+    checkpoint_callback = ModelCheckpoint(monitor='val_loss', mode='min', verbose=True,
+                                          filename='frn-{epoch:02d}-{val_loss:.4f}', save_weights_only=False)
+    gpus = CONFIG.gpus.split(',')
+    logger = TensorBoardLoggerExpanded(CONFIG.DATA.sr)
+    if args.version is not None:
+        model = resume(train_dataset, val_dataset, args.version)
+    else:
+        model = PLCModel(train_dataset,
+                         val_dataset,
+                         window_size=CONFIG.DATA.window_size,
+                         enc_layers=CONFIG.MODEL.enc_layers,
+                         enc_in_dim=CONFIG.MODEL.enc_in_dim,
+                         enc_dim=CONFIG.MODEL.enc_dim,
+                         pred_dim=CONFIG.MODEL.pred_dim,
+                         pred_layers=CONFIG.MODEL.pred_layers)
+    trainer = pl.Trainer(logger=logger,
+                         gradient_clip_val=CONFIG.TRAIN.clipping_val,
+                         gpus=len(gpus),
+                         max_epochs=CONFIG.TRAIN.epochs,
+                         accelerator="gpu" if len(gpus) > 1 else None,
+                         callbacks=[checkpoint_callback]
+                         )
+    print(model.hparams)
+    print(
+        'Dataset: {}, Train files: {}, Val files {}'.format(CONFIG.DATA.dataset, len(train_dataset), len(val_dataset)))
+    trainer.fit(model)
+def to_onnx(model, onnx_path):
+    model.eval()
+    model = OnnxWrapper(model)
+    torch.onnx.export(model,
+                      model.sample,
+                      onnx_path,
+                      export_params=True,
+                      opset_version=12,
+                      input_names=model.input_names,
+                      output_names=model.output_names,
+                      do_constant_folding=True,
+                      verbose=False)
+if __name__ == '__main__':
+    if args.mode == 'train':
+        train()
+    else:
+        model = resume(None, None, args.version)
+        print(model.hparams)
+        print(summarize(model))
+        model.eval()
+        model.freeze()
+        if args.mode == 'eval':
+            model.cuda(device=0)
+            trainer = pl.Trainer(accelerator='gpu', devices=1, enable_checkpointing=False, logger=False)
+            testset = TestLoader()
+            test_loader = DataLoader(testset, batch_size=1, num_workers=4)
+            trainer.test(model, test_loader)
+            print('Version', args.version)
+            masking = CONFIG.DATA.EVAL.masking
+            prob = CONFIG.DATA.EVAL.transition_probs[0]
+            loss_percent = (1 - prob[0]) / (2 - prob[0] - prob[1]) * 100
+            print('Evaluate with real trace' if masking == 'real' else
+                  'Evaluate with generated trace with {:.2f}% packet loss'.format(loss_percent))
+        elif args.mode == 'test':
+            model.cuda(device=0)
+            testset = BlindTestLoader(test_dir=CONFIG.TEST.in_dir)
+            test_loader = DataLoader(testset, batch_size=1, num_workers=4)
+            trainer = pl.Trainer(accelerator='gpu', devices=1, enable_checkpointing=False, logger=False)
+            preds = trainer.predict(model, test_loader, return_predictions=True)
+            mkdir_p(CONFIG.TEST.out_dir)
+            for idx, path in enumerate(test_loader.dataset.data_list):
+                out_path = os.path.join(CONFIG.TEST.out_dir, os.path.basename(path))
+                sf.write(out_path, preds[idx], samplerate=CONFIG.DATA.sr, subtype='PCM_16')
+        else:
+            onnx_path = 'lightning_logs/version_{}/checkpoints/frn.onnx'.format(str(args.version))
+            to_onnx(model, onnx_path)
+            print('ONNX model saved to', onnx_path)

requirements (1).txt ADDED Viewed

	@@ -0,0 +1,18 @@

+auraloss==0.3.0
+einops==0.6.0
+librosa==0.9.2
+matplotlib==3.5.3
+numpy==1.22.3
+onnxruntime==1.13.1
+pandas==1.5.3
+pydtmc==7.0.0
+pytorch_lightning==1.9.0
+scikit_learn==1.2.1
+soundfile==0.11.0
+torch==1.13.1
+torchmetrics==0.11.0
+tqdm==4.64.0
+pystoi==0.3.3
+pesq==0.0.4
+onnx==1.13.0
+altair<5

sample.wav ADDED Viewed

Binary file (797 kB). View file