updated paper reference in the LICENSE file (#31)

- updated paper reference in the LICENSE file
- improved documentation
This commit is contained in:
Robert Gerstenberger 2024-06-03 14:03:59 +02:00 committed by GitHub
parent 2978238318
commit 15fb8e661d
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
3 changed files with 26 additions and 25 deletions

View File

@ -46,7 +46,8 @@ following citation:
---------------------------------------------------------------------- ----------------------------------------------------------------------
Maciej Besta, Nils Blach, Ales Kubicek, Robert Gerstenberger, Lukas Maciej Besta, Nils Blach, Ales Kubicek, Robert Gerstenberger, Lukas
Gianinazzi, Joanna Gajda, Tomasz Lehmann, Michał Podstawski, Hubert Gianinazzi, Joanna Gajda, Tomasz Lehmann, Michał Podstawski, Hubert
Niewiadomski, Piotr Nyczyk, Torsten Hoefler: Graph of Thoughts: Solving Niewiadomski, Piotr Nyczyk, Torsten Hoefler (2024): Graph of Thoughts:
Elaborate Problems with Large Language Models. In: arXiv preprint Solving Elaborate Problems with Large Language Models. In: Proceedings
arXiv:2308.09687 of the AAAI Conference on Artificial Intelligence, 38(16),
17682-17690. https://doi.org/10.1609/aaai.v38i16.29720
---------------------------------------------------------------------- ----------------------------------------------------------------------

View File

@ -4,7 +4,7 @@ The Language Models module is responsible for managing the large language models
Currently, the framework supports the following LLMs: Currently, the framework supports the following LLMs:
- GPT-4 / GPT-3.5 (Remote - OpenAI API) - GPT-4 / GPT-3.5 (Remote - OpenAI API)
- Llama-2 (Local - HuggingFace Transformers) - LLaMA-2 (Local - HuggingFace Transformers)
The following sections describe how to instantiate individual LLMs and how to add new LLMs to the framework. The following sections describe how to instantiate individual LLMs and how to add new LLMs to the framework.
@ -13,50 +13,50 @@ The following sections describe how to instantiate individual LLMs and how to ad
- Fill configuration details based on the used model (below). - Fill configuration details based on the used model (below).
### GPT-4 / GPT-3.5 ### GPT-4 / GPT-3.5
- Adjust predefined `chatgpt`, `chatgpt4` or create new configuration with an unique key. - Adjust the predefined `chatgpt` or `chatgpt4` configurations or create a new configuration with an unique key.
| Key | Value | | Key | Value |
|---------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| |---------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| model_id | Model name based on [OpenAI model overview](https://platform.openai.com/docs/models/overview). | | model_id | Model name based on [OpenAI model overview](https://platform.openai.com/docs/models/overview). |
| prompt_token_cost | Price per 1000 prompt tokens based on [OpenAI pricing](https://openai.com/pricing), used for calculating cumulative price per LLM instance. | | prompt_token_cost | Price per 1000 prompt tokens based on [OpenAI pricing](https://openai.com/pricing), used for calculating cumulative price per LLM instance. |
| response_token_cost | Price per 1000 response tokens based on [OpenAI pricing](https://openai.com/pricing), used for calculating cumulative price per LLM instance. | | response_token_cost | Price per 1000 response tokens based on [OpenAI pricing](https://openai.com/pricing), used for calculating cumulative price per LLM instance. |
| temperature | Parameter of OpenAI models that controls randomness and the creativity of the responses (higher temperature = more diverse and unexpected responses). Value between 0.0 and 2.0, default is 1.0. More information can be found in the [OpenAI API reference](https://platform.openai.com/docs/api-reference/completions/create#completions/create-temperature). | | temperature | Parameter of OpenAI models that controls the randomness and the creativity of the responses (higher temperature = more diverse and unexpected responses). Value between 0.0 and 2.0, default is 1.0. More information can be found in the [OpenAI API reference](https://platform.openai.com/docs/api-reference/completions/create#completions/create-temperature). |
| max_tokens | The maximum number of tokens to generate in the chat completion. Value depends on the maximum context size of the model specified in the [OpenAI model overview](https://platform.openai.com/docs/models/overview). More information can be found in the [OpenAI API reference](https://platform.openai.com/docs/api-reference/chat/create#chat/create-max_tokens). | | max_tokens | The maximum number of tokens to generate in the chat completion. Value depends on the maximum context size of the model specified in the [OpenAI model overview](https://platform.openai.com/docs/models/overview). More information can be found in the [OpenAI API reference](https://platform.openai.com/docs/api-reference/chat/create#chat/create-max_tokens). |
| stop | String or array of strings specifying sequence of characters which if detected, stops further generation of tokens. More information can be found in the [OpenAI API reference](https://platform.openai.com/docs/api-reference/chat/create#chat/create-stop). | | stop | String or array of strings specifying sequences of characters which if detected, stops further generation of tokens. More information can be found in the [OpenAI API reference](https://platform.openai.com/docs/api-reference/chat/create#chat/create-stop). |
| organization | Organization to use for the API requests (may be empty). | | organization | Organization to use for the API requests (may be empty). |
| api_key | Personal API key that will be used to access OpenAI API. | | api_key | Personal API key that will be used to access OpenAI API. |
- Instantiate the language model based on the selected configuration key (predefined / custom). - Instantiate the language model based on the selected configuration key (predefined / custom).
``` ```python
lm = controller.ChatGPT( lm = controller.ChatGPT(
"path/to/config.json", "path/to/config.json",
model_name=<configuration key> model_name=<configuration key>
) )
``` ```
### Llama-2 ### LLaMA-2
- Requires local hardware to run inference and a HuggingFace account. - Requires local hardware to run inference and a HuggingFace account.
- Adjust predefined `llama7b-hf`, `llama13b-hf`, `llama70b-hf` or create a new configuration with an unique key. - Adjust the predefined `llama7b-hf`, `llama13b-hf` or `llama70b-hf` configurations or create a new configuration with an unique key.
| Key | Value | | Key | Value |
|---------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| |---------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| model_id | Specifies HuggingFace Llama 2 model identifier (`meta-llama/<model_id>`). | | model_id | Specifies HuggingFace LLaMA-2 model identifier (`meta-llama/<model_id>`). |
| cache_dir | Local directory where model will be downloaded and accessed. | | cache_dir | Local directory where the model will be downloaded and accessed. |
| prompt_token_cost | Price per 1000 prompt tokens (currently not used - local model = no cost). | | prompt_token_cost | Price per 1000 prompt tokens (currently not used - local model = no cost). |
| response_token_cost | Price per 1000 response tokens (currently not used - local model = no cost). | | response_token_cost | Price per 1000 response tokens (currently not used - local model = no cost). |
| temperature | Parameter that controls randomness and the creativity of the responses (higher temperature = more diverse and unexpected responses). Value between 0.0 and 1.0, default is 0.6. | | temperature | Parameter that controls the randomness and the creativity of the responses (higher temperature = more diverse and unexpected responses). Value between 0.0 and 1.0, default is 0.6. |
| top_k | Top-K sampling method described in [Transformers tutorial](https://huggingface.co/blog/how-to-generate). Default value is set to 10. | | top_k | Top-K sampling method described in [Transformers tutorial](https://huggingface.co/blog/how-to-generate). Default value is set to 10. |
| max_tokens | The maximum number of tokens to generate in the chat completion. More tokens require more memory. | | max_tokens | The maximum number of tokens to generate in the chat completion. More tokens require more memory. |
- Instantiate the language model based on the selected configuration key (predefined / custom). - Instantiate the language model based on the selected configuration key (predefined / custom).
``` ```python
lm = controller.Llama2HF( lm = controller.Llama2HF(
"path/to/config.json", "path/to/config.json",
model_name=<configuration key> model_name=<configuration key>
) )
``` ```
- Request access to Llama-2 via the [Meta form](https://ai.meta.com/resources/models-and-libraries/llama-downloads/) using the same email address as for the HuggingFace account. - Request access to LLaMA-2 via the [Meta form](https://ai.meta.com/resources/models-and-libraries/llama-downloads/) using the same email address as for the HuggingFace account.
- After the access is granted, go to [HuggingFace Llama-2 model card](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf), log in and accept the license (_"You have been granted access to this model"_ message should appear). - After the access is granted, go to [HuggingFace LLaMA-2 model card](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf), log in and accept the license (a _"You have been granted access to this model"_ message should appear).
- Generate HuggingFace access token. - Generate HuggingFace access token.
- Log in from CLI with: `huggingface-cli login --token <your token>`. - Log in from CLI with: `huggingface-cli login --token <your token>`.
@ -64,9 +64,9 @@ Note: 4-bit quantization is used to reduce the model size for inference. During
## Adding LLMs ## Adding LLMs
More LLMs can be added by following these steps: More LLMs can be added by following these steps:
- Create new class as a subclass of `AbstractLanguageModel`. - Create a new class as a subclass of `AbstractLanguageModel`.
- Use the constructor for loading configuration and instantiating the language model (if needed). - Use the constructor for loading the configuration and instantiating the language model (if needed).
``` ```python
class CustomLanguageModel(AbstractLanguageModel): class CustomLanguageModel(AbstractLanguageModel):
def __init__( def __init__(
self, self,
@ -81,15 +81,15 @@ class CustomLanguageModel(AbstractLanguageModel):
# Instantiate LLM if needed # Instantiate LLM if needed
``` ```
- Implement `query` abstract method that is used to get a list of responses from the LLM (call to remote API or local model inference). - Implement the `query` abstract method that is used to get a list of responses from the LLM (remote API call or local model inference).
``` ```python
def query(self, query: str, num_responses: int = 1) -> Any: def query(self, query: str, num_responses: int = 1) -> Any:
# Support caching # Support caching
# Call LLM and retrieve list of responses - based on num_responses # Call LLM and retrieve list of responses - based on num_responses
# Return LLM response structure (not only raw strings) # Return LLM response structure (not only raw strings)
``` ```
- Implement `get_response_texts` abstract method that is used to get a list of raw texts from the LLM response structure produced by `query`. - Implement the `get_response_texts` abstract method that is used to get a list of raw texts from the LLM response structure produced by `query`.
``` ```python
def get_response_texts(self, query_response: Union[List[Dict], Dict]) -> List[str]: def get_response_texts(self, query_response: Union[List[Any], Any]) -> List[str]:
# Retrieve list of raw strings from the LLM response structure # Retrieve list of raw strings from the LLM response structure
``` ```

View File

@ -7,6 +7,6 @@ The poster presented at the 2024 Association for the Advancement of Artificial I
## Plot Data ## Plot Data
The data used to create the figure of the arXiv preprint article can be The data used to create the figures of the paper can be
found in the `final_results_gpt35.tar.bz2` archive. Unpack the archive found in the `final_results_gpt35.tar.bz2` archive. Unpack the archive
and run the file `plots.py`. and run the file `plots.py`.