usc-ur

usc-ur OP t1_jd3nzab wrote

The main purpose of this project is joining in a single environment all the resources (models, prompts, APIs, etc.) related to LLMs. Moreover, we also think from an end-user perspective. It is heavily unlikely that a user would introduce a complex context in a query to a model or searcher. In this project, we try to bias the different model responses to answer in different ways/behaviors, but hidding this to end-users.

1

usc-ur OP t1_jcb5zvs wrote

We have also connected this tool with the [Awesome ChatGPT Prompts dataset](https://huggingface.co/datasets/fka/awesome-chatgpt-prompts) and [repo](https://github.com/f/awesome-chatgpt-prompts) from Fatih Kadir Akın.
Our purpose is to connect all this data with the API and make the proccess **fully transparent** to the end user.

1

usc-ur OP t1_jcb5ykg wrote

We have also connected this tool with the [Awesome ChatGPT Prompts dataset](https://huggingface.co/datasets/fka/awesome-chatgpt-prompts) and [repo](https://github.com/f/awesome-chatgpt-prompts) from Fatih Kadir Akın.
Our purpose is to connect all this data with the API and make the proccess **fully transparent** to the end user.

1

usc-ur OP t1_jca4ot7 wrote

Sure! The idea is that you create a language model from a given corpus (let's say BNC) and then you use a similarity measure, in this case, perplexity, but can be another one to test how well your sample (sentence) "fits" into the model distribution. Since we assume the distribution is correct, this allows us to identified malformed sentences. You can also check the paper here: https://www.cambridge.org/core/journals/natural-language-engineering/article/an-unsupervised-perplexitybased-method-for-boilerplate-removal/5E589D838F1D1E0736B4F52001150339#article

2