diff --git a/graph_of_thoughts/controller/README.md b/graph_of_thoughts/controller/README.md index b0f24dc..ca05662 100644 --- a/graph_of_thoughts/controller/README.md +++ b/graph_of_thoughts/controller/README.md @@ -1,75 +1,18 @@ # Controller -The Controller class is responsible for traversing the Graph of Operations (GoO), which is a static structure that is constructed once, before the execution starts. +The Controller class is responsible for traversing the Graph of Operations (GoO), which is a static structure that is constructed once, before the execution starts. GoO prescribes the execution plan of thought operations and the Controller invokes their execution, generating the Graph Reasoning State (GRS). -In order for a GoO to be executed, an instance of Large Language Model (LLM) must be supplied to the controller. -Currently, the framework supports the following LLMs: -- GPT-4 / GPT-3.5 (Remote - OpenAI API) -- Llama-2 (Local - HuggingFace Transformers) +In order for a GoO to be executed, an instance of Large Language Model (LLM) must be supplied to the controller (along with other required objects). +Please refer to the [Language Models](../language_models/README.md) section for more information about LLMs. -The following section describes how to instantiate individual LLMs and the Controller to run a defined GoO. -Furthermore, the process of adding new LLMs into the framework is outlined at the end. - -## LLM Instantiation -- Create a copy of `config_template.json` named `config.json`. -- Fill configuration details based on the used model (below). - -### GPT-4 / GPT-3.5 -- Adjust predefined `chatgpt`, `chatgpt4` or create new configuration with an unique key. - -| Key | Value | -|---------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| model_id | Model name based on [OpenAI model overview](https://platform.openai.com/docs/models/overview). | -| prompt_token_cost | Price per 1000 prompt tokens based on [OpenAI pricing](https://openai.com/pricing), used for calculating cumulative price per LLM instance. | -| response_token_cost | Price per 1000 response tokens based on [OpenAI pricing](https://openai.com/pricing), used for calculating cumulative price per LLM instance. | -| temperature | Parameter of OpenAI models that controls randomness and the creativity of the responses (higher temperature = more diverse and unexpected responses). Value between 0.0 and 2.0, default is 1.0. More information can be found in the [OpenAI API reference](https://platform.openai.com/docs/api-reference/completions/create#completions/create-temperature). | -| max_tokens | The maximum number of tokens to generate in the chat completion. Value depends on the maximum context size of the model specified in the [OpenAI model overview](https://platform.openai.com/docs/models/overview). More information can be found in the [OpenAI API reference](https://platform.openai.com/docs/api-reference/chat/create#chat/create-max_tokens). | -| stop | String or array of strings specifying sequence of characters which if detected, stops further generation of tokens. More information can be found in the [OpenAI API reference](https://platform.openai.com/docs/api-reference/chat/create#chat/create-stop). | -| organization | Organization to use for the API requests (may be empty). | -| api_key | Personal API key that will be used to access OpenAI API. | - -- Instantiate the language model based on the selected configuration key (predefined / custom). -``` -lm = controller.ChatGPT( - "path/to/config.json", - model_name= -) -``` - -### Llama-2 -- Requires local hardware to run inference and a HuggingFace account. -- Adjust predefined `llama7b-hf`, `llama13b-hf`, `llama70b-hf` or create a new configuration with an unique key. - -| Key | Value | -|---------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| model_id | Specifies HuggingFace Llama 2 model identifier (`meta-llama/`). | -| cache_dir | Local directory where model will be downloaded and accessed. | -| prompt_token_cost | Price per 1000 prompt tokens (currently not used - local model = no cost). | -| response_token_cost | Price per 1000 response tokens (currently not used - local model = no cost). | -| temperature | Parameter that controls randomness and the creativity of the responses (higher temperature = more diverse and unexpected responses). Value between 0.0 and 1.0, default is 0.6. | -| top_k | Top-K sampling method described in [Transformers tutorial](https://huggingface.co/blog/how-to-generate). Default value is set to 10. | -| max_tokens | The maximum number of tokens to generate in the chat completion. More tokens require more memory. | - -- Instantiate the language model based on the selected configuration key (predefined / custom). -``` -lm = controller.Llama2HF( - "path/to/config.json", - model_name= -) -``` -- Request access to Llama-2 via the [Meta form](https://ai.meta.com/resources/models-and-libraries/llama-downloads/) using the same email address as for the HuggingFace account. -- After the access is granted, go to [HuggingFace Llama-2 model card](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf), log in and accept the license (_"You have been granted access to this model"_ message should appear). -- Generate HuggingFace access token. -- Log in from CLI with: `huggingface-cli login --token `. - -Note: 4-bit quantization is used to reduce the model size for inference. During instantiation, the model is downloaded from HuggingFace into the cache directory specified in the `config.json`. Running queries using larger models will require multiple GPUs (splitting across many GPUs is done automatically by the Transformers library). +The following section describes how to instantiate the Controller to run a defined GoO. ## Controller Instantiation -- Requires custom `Prompter`, `Parser` and instantiated `GraphOfOperations` - creation of these is described separately. -- Use instantiated `lm` from above. +- Requires custom `Prompter`, `Parser`, as well as instantiated `GraphOfOperations` and `AbstractLanguageModel` - creation of these is described separately. - Prepare initial state (thought) as dictionary - this can be used in the initial prompts by the operations. ``` +lm = ...create graph_of_operations = ...create executor = controller.Controller( @@ -83,35 +26,3 @@ executor.run() executor.output_graph("path/to/output.json") ``` - After the run the graph is written to an output file, which contains individual operations, their thoughts, information about scores and validity and total amount of used tokens / cost. - -## Adding LLMs -More LLMs can be added by following these steps: -- Create new class as a subclass of `AbstractLanguageModel`. -- Use the constructor for loading configuration and instantiating the language model (if needed). -``` -class CustomLanguageModel(AbstractLanguageModel): - def __init__( - self, - config_path: str = "", - model_name: str = "llama7b-hf", - cache: bool = False - ) -> None: - super().__init__(config_path, model_name, cache) - self.config: Dict = self.config[model_name] - - # Load data from configuration into variables if needed - - # Instantiate LLM if needed -``` -- Implement `query` abstract method that is used to get a list of responses from the LLM (call to remote API or local model inference). -``` -def query(self, query: str, num_responses: int = 1) -> Any: - # Support caching - # Call LLM and retrieve list of responses - based on num_responses - # Return LLM response structure (not only raw strings) -``` -- Implement `get_response_texts` abstract method that is used to get a list of raw texts from the LLM response structure produced by `query`. -``` -def get_response_texts(self, query_response: Union[List[Dict], Dict]) -> List[str]: - # Retrieve list of raw strings from the LLM response structure -``` diff --git a/graph_of_thoughts/controller/__init__.py b/graph_of_thoughts/controller/__init__.py index 921470f..f41157f 100644 --- a/graph_of_thoughts/controller/__init__.py +++ b/graph_of_thoughts/controller/__init__.py @@ -1,4 +1 @@ -from .chatgpt import ChatGPT -from .llamachat_hf import Llama2HF -from .abstract_language_model import AbstractLanguageModel from .controller import Controller diff --git a/graph_of_thoughts/controller/controller.py b/graph_of_thoughts/controller/controller.py index f42d519..821beac 100644 --- a/graph_of_thoughts/controller/controller.py +++ b/graph_of_thoughts/controller/controller.py @@ -9,7 +9,7 @@ import json import logging from typing import List -from .abstract_language_model import AbstractLanguageModel +from graph_of_thoughts.language_models import AbstractLanguageModel from graph_of_thoughts.operations import GraphOfOperations, Thought from graph_of_thoughts.prompter import Prompter from graph_of_thoughts.parser import Parser diff --git a/graph_of_thoughts/language_models/README.md b/graph_of_thoughts/language_models/README.md new file mode 100644 index 0000000..334d1bb --- /dev/null +++ b/graph_of_thoughts/language_models/README.md @@ -0,0 +1,95 @@ +# Language Models + +The Language Models module is responsible for managing the large language models (LLMs) used by the Controller. + +Currently, the framework supports the following LLMs: +- GPT-4 / GPT-3.5 (Remote - OpenAI API) +- Llama-2 (Local - HuggingFace Transformers) + +The following sections describe how to instantiate individual LLMs and how to add new LLMs to the framework. + +## LLM Instantiation +- Create a copy of `config_template.json` named `config.json`. +- Fill configuration details based on the used model (below). + +### GPT-4 / GPT-3.5 +- Adjust predefined `chatgpt`, `chatgpt4` or create new configuration with an unique key. + +| Key | Value | +|---------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| model_id | Model name based on [OpenAI model overview](https://platform.openai.com/docs/models/overview). | +| prompt_token_cost | Price per 1000 prompt tokens based on [OpenAI pricing](https://openai.com/pricing), used for calculating cumulative price per LLM instance. | +| response_token_cost | Price per 1000 response tokens based on [OpenAI pricing](https://openai.com/pricing), used for calculating cumulative price per LLM instance. | +| temperature | Parameter of OpenAI models that controls randomness and the creativity of the responses (higher temperature = more diverse and unexpected responses). Value between 0.0 and 2.0, default is 1.0. More information can be found in the [OpenAI API reference](https://platform.openai.com/docs/api-reference/completions/create#completions/create-temperature). | +| max_tokens | The maximum number of tokens to generate in the chat completion. Value depends on the maximum context size of the model specified in the [OpenAI model overview](https://platform.openai.com/docs/models/overview). More information can be found in the [OpenAI API reference](https://platform.openai.com/docs/api-reference/chat/create#chat/create-max_tokens). | +| stop | String or array of strings specifying sequence of characters which if detected, stops further generation of tokens. More information can be found in the [OpenAI API reference](https://platform.openai.com/docs/api-reference/chat/create#chat/create-stop). | +| organization | Organization to use for the API requests (may be empty). | +| api_key | Personal API key that will be used to access OpenAI API. | + +- Instantiate the language model based on the selected configuration key (predefined / custom). +``` +lm = controller.ChatGPT( + "path/to/config.json", + model_name= +) +``` + +### Llama-2 +- Requires local hardware to run inference and a HuggingFace account. +- Adjust predefined `llama7b-hf`, `llama13b-hf`, `llama70b-hf` or create a new configuration with an unique key. + +| Key | Value | +|---------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| model_id | Specifies HuggingFace Llama 2 model identifier (`meta-llama/`). | +| cache_dir | Local directory where model will be downloaded and accessed. | +| prompt_token_cost | Price per 1000 prompt tokens (currently not used - local model = no cost). | +| response_token_cost | Price per 1000 response tokens (currently not used - local model = no cost). | +| temperature | Parameter that controls randomness and the creativity of the responses (higher temperature = more diverse and unexpected responses). Value between 0.0 and 1.0, default is 0.6. | +| top_k | Top-K sampling method described in [Transformers tutorial](https://huggingface.co/blog/how-to-generate). Default value is set to 10. | +| max_tokens | The maximum number of tokens to generate in the chat completion. More tokens require more memory. | + +- Instantiate the language model based on the selected configuration key (predefined / custom). +``` +lm = controller.Llama2HF( + "path/to/config.json", + model_name= +) +``` +- Request access to Llama-2 via the [Meta form](https://ai.meta.com/resources/models-and-libraries/llama-downloads/) using the same email address as for the HuggingFace account. +- After the access is granted, go to [HuggingFace Llama-2 model card](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf), log in and accept the license (_"You have been granted access to this model"_ message should appear). +- Generate HuggingFace access token. +- Log in from CLI with: `huggingface-cli login --token `. + +Note: 4-bit quantization is used to reduce the model size for inference. During instantiation, the model is downloaded from HuggingFace into the cache directory specified in the `config.json`. Running queries using larger models will require multiple GPUs (splitting across many GPUs is done automatically by the Transformers library). + +## Adding LLMs +More LLMs can be added by following these steps: +- Create new class as a subclass of `AbstractLanguageModel`. +- Use the constructor for loading configuration and instantiating the language model (if needed). +``` +class CustomLanguageModel(AbstractLanguageModel): + def __init__( + self, + config_path: str = "", + model_name: str = "llama7b-hf", + cache: bool = False + ) -> None: + super().__init__(config_path, model_name, cache) + self.config: Dict = self.config[model_name] + + # Load data from configuration into variables if needed + + # Instantiate LLM if needed +``` +- Implement `query` abstract method that is used to get a list of responses from the LLM (call to remote API or local model inference). +``` +def query(self, query: str, num_responses: int = 1) -> Any: + # Support caching + # Call LLM and retrieve list of responses - based on num_responses + # Return LLM response structure (not only raw strings) +``` +- Implement `get_response_texts` abstract method that is used to get a list of raw texts from the LLM response structure produced by `query`. +``` +def get_response_texts(self, query_response: Union[List[Dict], Dict]) -> List[str]: + # Retrieve list of raw strings from the LLM response structure +``` diff --git a/graph_of_thoughts/language_models/__init__.py b/graph_of_thoughts/language_models/__init__.py new file mode 100644 index 0000000..076d62e --- /dev/null +++ b/graph_of_thoughts/language_models/__init__.py @@ -0,0 +1,3 @@ +from .abstract_language_model import AbstractLanguageModel +from .chatgpt import ChatGPT +from .llamachat_hf import Llama2HF diff --git a/graph_of_thoughts/controller/abstract_language_model.py b/graph_of_thoughts/language_models/abstract_language_model.py similarity index 100% rename from graph_of_thoughts/controller/abstract_language_model.py rename to graph_of_thoughts/language_models/abstract_language_model.py diff --git a/graph_of_thoughts/controller/chatgpt.py b/graph_of_thoughts/language_models/chatgpt.py similarity index 100% rename from graph_of_thoughts/controller/chatgpt.py rename to graph_of_thoughts/language_models/chatgpt.py diff --git a/graph_of_thoughts/controller/config_template.json b/graph_of_thoughts/language_models/config_template.json similarity index 100% rename from graph_of_thoughts/controller/config_template.json rename to graph_of_thoughts/language_models/config_template.json diff --git a/graph_of_thoughts/controller/llamachat_hf.py b/graph_of_thoughts/language_models/llamachat_hf.py similarity index 100% rename from graph_of_thoughts/controller/llamachat_hf.py rename to graph_of_thoughts/language_models/llamachat_hf.py diff --git a/graph_of_thoughts/operations/operations.py b/graph_of_thoughts/operations/operations.py index a5b031b..bb9b31e 100644 --- a/graph_of_thoughts/operations/operations.py +++ b/graph_of_thoughts/operations/operations.py @@ -14,7 +14,7 @@ from abc import ABC, abstractmethod import itertools from graph_of_thoughts.operations.thought import Thought -from graph_of_thoughts.controller.abstract_language_model import AbstractLanguageModel +from graph_of_thoughts.language_models import AbstractLanguageModel from graph_of_thoughts.prompter import Prompter from graph_of_thoughts.parser import Parser