Move langauge model abstractions and implementation into their own module; create a new README; update dependencies

2023-10-18 10:31:26 +09:00 · 2023-10-18 10:31:26 +09:00 · 7441be41ba
commit 7441be41ba
parent 8590e4bd65
10 changed files with 106 additions and 100 deletions
--- a/graph_of_thoughts/controller/README.md
+++ b/graph_of_thoughts/controller/README.md
@ -3,73 +3,16 @@
 The Controller class is responsible for traversing the Graph of Operations (GoO), which is a static structure that is constructed once, before the execution starts.
 GoO prescribes the execution plan of thought operations and the Controller invokes their execution, generating the Graph Reasoning State (GRS). 

-In order for a GoO to be executed, an instance of Large Language Model (LLM) must be supplied to the controller. 
-Currently, the framework supports the following LLMs:
- GPT-4 / GPT-3.5 (Remote - OpenAI API)
- Llama-2 (Local - HuggingFace Transformers) 
+In order for a GoO to be executed, an instance of Large Language Model (LLM) must be supplied to the controller (along with other required objects).
+Please refer to the [Language Models](../language_models/README.md) section for more information about LLMs. 

-The following section describes how to instantiate individual LLMs and the Controller to run a defined GoO. 
-Furthermore, the process of adding new LLMs into the framework is outlined at the end.
-
-## LLM Instantiation
- Create a copy of `config_template.json` named `config.json`.
- Fill configuration details based on the used model (below).
-
-### GPT-4 / GPT-3.5
- Adjust predefined `chatgpt`,  `chatgpt4` or create new configuration with an unique key.
-
-| Key                 | Value                                                                                                                                                                                                                                                                                                                                                               |
-|---------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| model_id            | Model name based on [OpenAI model overview](https://platform.openai.com/docs/models/overview).                                                                                                                                                                                                                                                                      |
-| prompt_token_cost   | Price per 1000 prompt tokens based on [OpenAI pricing](https://openai.com/pricing), used for calculating cumulative price per LLM instance.                                                                                                                                                                                                                         |
-| response_token_cost | Price per 1000 response tokens based on [OpenAI pricing](https://openai.com/pricing), used for calculating cumulative price per LLM instance.                                                                                                                                                                                                                       |
-| temperature         | Parameter of OpenAI models that controls randomness and the creativity of the responses (higher temperature = more diverse and unexpected responses). Value between 0.0 and 2.0, default is 1.0. More information can be found in the [OpenAI API reference](https://platform.openai.com/docs/api-reference/completions/create#completions/create-temperature).     |
-| max_tokens          | The maximum number of tokens to generate in the chat completion. Value depends on the maximum context size of the model specified in the [OpenAI model overview](https://platform.openai.com/docs/models/overview). More information can be found in the [OpenAI API reference](https://platform.openai.com/docs/api-reference/chat/create#chat/create-max_tokens). |
-| stop                | String or array of strings specifying sequence of characters which if detected, stops further generation of tokens. More information can be found in the [OpenAI API reference](https://platform.openai.com/docs/api-reference/chat/create#chat/create-stop).                                                                                                       |
-| organization        | Organization to use for the API requests (may be empty).                                                                                                                                                                                                                                                                                                            |
-| api_key             | Personal API key that will be used to access OpenAI API.                                                                                                                                                                                                                                                                                                            |
-
- Instantiate the language model based on the selected configuration key (predefined / custom).
-```
-lm = controller.ChatGPT(
-    "path/to/config.json", 
-    model_name=<configuration key>
-)
-```
-
-### Llama-2
- Requires local hardware to run inference and a HuggingFace account.
- Adjust predefined `llama7b-hf`, `llama13b-hf`, `llama70b-hf` or create a new configuration with an unique key.
-
-| Key                 | Value                                                                                                                                                                           |
-|---------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| model_id            | Specifies HuggingFace Llama 2 model identifier (`meta-llama/<model_id>`).                                                                                                       |
-| cache_dir           | Local directory where model will be downloaded and accessed.                                                                                                                    |
-| prompt_token_cost   | Price per 1000 prompt tokens (currently not used - local model = no cost).                                                                                                      |
-| response_token_cost | Price per 1000 response tokens (currently not used - local model = no cost).                                                                                                    |
-| temperature         | Parameter that controls randomness and the creativity of the responses (higher temperature = more diverse and unexpected responses). Value between 0.0 and 1.0, default is 0.6. |
-| top_k               | Top-K sampling method described in [Transformers tutorial](https://huggingface.co/blog/how-to-generate). Default value is set to 10.                                            |
-| max_tokens          | The maximum number of tokens to generate in the chat completion. More tokens require more memory.                                                                               |
-
- Instantiate the language model based on the selected configuration key (predefined / custom).
-```
-lm = controller.Llama2HF(
-    "path/to/config.json", 
-    model_name=<configuration key>
-)
-```
- Request access to Llama-2 via the [Meta form](https://ai.meta.com/resources/models-and-libraries/llama-downloads/) using the same email address as for the HuggingFace account.
- After the access is granted, go to [HuggingFace Llama-2 model card](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf), log in and accept the license (_"You have been granted access to this model"_ message should appear).
- Generate HuggingFace access token.
- Log in from CLI with: `huggingface-cli login --token <your token>`.
-
-Note: 4-bit quantization is used to reduce the model size for inference. During instantiation, the model is downloaded from HuggingFace into the cache directory specified in the `config.json`. Running queries using larger models will require multiple GPUs (splitting across many GPUs is done automatically by the Transformers library).
+The following section describes how to instantiate the Controller to run a defined GoO. 

 ## Controller Instantiation
- Requires custom `Prompter`, `Parser` and instantiated `GraphOfOperations` - creation of these is described separately.
- Use instantiated `lm` from above.
+- Requires custom `Prompter`, `Parser`, as well as instantiated `GraphOfOperations` and `AbstractLanguageModel` - creation of these is described separately.
 - Prepare initial state (thought) as dictionary - this can be used in the initial prompts by the operations.
 ```
+lm = ...create
 graph_of_operations = ...create

 executor = controller.Controller(
@ -83,35 +26,3 @@ executor.run()
 executor.output_graph("path/to/output.json")
 ```
 - After the run the graph is written to an output file, which contains individual operations, their thoughts, information about scores and validity and total amount of used tokens / cost.
-
-## Adding LLMs
-More LLMs can be added by following these steps:
- Create new class as a subclass of `AbstractLanguageModel`.
- Use the constructor for loading configuration and instantiating the language model (if needed). 
-```
-class CustomLanguageModel(AbstractLanguageModel):
-    def __init__(
-        self,
-        config_path: str = "",
-        model_name: str = "llama7b-hf",
-        cache: bool = False
-    ) -> None:
-        super().__init__(config_path, model_name, cache)
-        self.config: Dict = self.config[model_name]
-        
-        # Load data from configuration into variables if needed
-
-        # Instantiate LLM if needed
-```
- Implement `query` abstract method that is used to get a list of responses from the LLM (call to remote API or local model inference).
-```
-def query(self, query: str, num_responses: int = 1) -> Any:
-    # Support caching 
-    # Call LLM and retrieve list of responses - based on num_responses    
-    # Return LLM response structure (not only raw strings)    
-```
- Implement `get_response_texts` abstract method that is used to get a list of raw texts from the LLM response structure produced by `query`.
-```
-def get_response_texts(self, query_response: Union[List[Dict], Dict]) -> List[str]:
-    # Retrieve list of raw strings from the LLM response structure    
-```
--- a/graph_of_thoughts/controller/init.py
+++ b/graph_of_thoughts/controller/init.py
@ -1,4 +1 @@
-from .chatgpt import ChatGPT
-from .llamachat_hf import Llama2HF
-from .abstract_language_model import AbstractLanguageModel
 from .controller import Controller
--- a/graph_of_thoughts/controller/controller.py
+++ b/graph_of_thoughts/controller/controller.py
@ -9,7 +9,7 @@
 import json
 import logging
 from typing import List
-from .abstract_language_model import AbstractLanguageModel
+from graph_of_thoughts.language_models import AbstractLanguageModel
 from graph_of_thoughts.operations import GraphOfOperations, Thought
 from graph_of_thoughts.prompter import Prompter
 from graph_of_thoughts.parser import Parser
--- a/graph_of_thoughts/language_models/README.md
+++ b/graph_of_thoughts/language_models/README.md
@ -0,0 +1,95 @@
+# Language Models
+
+The Language Models module is responsible for managing the large language models (LLMs) used by the Controller.
+
+Currently, the framework supports the following LLMs:
+- GPT-4 / GPT-3.5 (Remote - OpenAI API)
+- Llama-2 (Local - HuggingFace Transformers) 
+
+The following sections describe how to instantiate individual LLMs and how to add new LLMs to the framework.
+
+## LLM Instantiation
+- Create a copy of `config_template.json` named `config.json`.
+- Fill configuration details based on the used model (below).
+
+### GPT-4 / GPT-3.5
+- Adjust predefined `chatgpt`,  `chatgpt4` or create new configuration with an unique key.
+
+| Key                 | Value                                                                                                                                                                                                                                                                                                                                                               |
+|---------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| model_id            | Model name based on [OpenAI model overview](https://platform.openai.com/docs/models/overview).                                                                                                                                                                                                                                                                      |
+| prompt_token_cost   | Price per 1000 prompt tokens based on [OpenAI pricing](https://openai.com/pricing), used for calculating cumulative price per LLM instance.                                                                                                                                                                                                                         |
+| response_token_cost | Price per 1000 response tokens based on [OpenAI pricing](https://openai.com/pricing), used for calculating cumulative price per LLM instance.                                                                                                                                                                                                                       |
+| temperature         | Parameter of OpenAI models that controls randomness and the creativity of the responses (higher temperature = more diverse and unexpected responses). Value between 0.0 and 2.0, default is 1.0. More information can be found in the [OpenAI API reference](https://platform.openai.com/docs/api-reference/completions/create#completions/create-temperature).     |
+| max_tokens          | The maximum number of tokens to generate in the chat completion. Value depends on the maximum context size of the model specified in the [OpenAI model overview](https://platform.openai.com/docs/models/overview). More information can be found in the [OpenAI API reference](https://platform.openai.com/docs/api-reference/chat/create#chat/create-max_tokens). |
+| stop                | String or array of strings specifying sequence of characters which if detected, stops further generation of tokens. More information can be found in the [OpenAI API reference](https://platform.openai.com/docs/api-reference/chat/create#chat/create-stop).                                                                                                       |
+| organization        | Organization to use for the API requests (may be empty).                                                                                                                                                                                                                                                                                                            |
+| api_key             | Personal API key that will be used to access OpenAI API.                                                                                                                                                                                                                                                                                                            |
+
+- Instantiate the language model based on the selected configuration key (predefined / custom).
+```
+lm = controller.ChatGPT(
+    "path/to/config.json", 
+    model_name=<configuration key>
+)
+```
+
+### Llama-2
+- Requires local hardware to run inference and a HuggingFace account.
+- Adjust predefined `llama7b-hf`, `llama13b-hf`, `llama70b-hf` or create a new configuration with an unique key.
+
+| Key                 | Value                                                                                                                                                                           |
+|---------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| model_id            | Specifies HuggingFace Llama 2 model identifier (`meta-llama/<model_id>`).                                                                                                       |
+| cache_dir           | Local directory where model will be downloaded and accessed.                                                                                                                    |
+| prompt_token_cost   | Price per 1000 prompt tokens (currently not used - local model = no cost).                                                                                                      |
+| response_token_cost | Price per 1000 response tokens (currently not used - local model = no cost).                                                                                                    |
+| temperature         | Parameter that controls randomness and the creativity of the responses (higher temperature = more diverse and unexpected responses). Value between 0.0 and 1.0, default is 0.6. |
+| top_k               | Top-K sampling method described in [Transformers tutorial](https://huggingface.co/blog/how-to-generate). Default value is set to 10.                                            |
+| max_tokens          | The maximum number of tokens to generate in the chat completion. More tokens require more memory.                                                                               |
+
+- Instantiate the language model based on the selected configuration key (predefined / custom).
+```
+lm = controller.Llama2HF(
+    "path/to/config.json", 
+    model_name=<configuration key>
+)
+```
+- Request access to Llama-2 via the [Meta form](https://ai.meta.com/resources/models-and-libraries/llama-downloads/) using the same email address as for the HuggingFace account.
+- After the access is granted, go to [HuggingFace Llama-2 model card](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf), log in and accept the license (_"You have been granted access to this model"_ message should appear).
+- Generate HuggingFace access token.
+- Log in from CLI with: `huggingface-cli login --token <your token>`.
+
+Note: 4-bit quantization is used to reduce the model size for inference. During instantiation, the model is downloaded from HuggingFace into the cache directory specified in the `config.json`. Running queries using larger models will require multiple GPUs (splitting across many GPUs is done automatically by the Transformers library).
+
+## Adding LLMs
+More LLMs can be added by following these steps:
+- Create new class as a subclass of `AbstractLanguageModel`.
+- Use the constructor for loading configuration and instantiating the language model (if needed). 
+```
+class CustomLanguageModel(AbstractLanguageModel):
+    def __init__(
+        self,
+        config_path: str = "",
+        model_name: str = "llama7b-hf",
+        cache: bool = False
+    ) -> None:
+        super().__init__(config_path, model_name, cache)
+        self.config: Dict = self.config[model_name]
+        
+        # Load data from configuration into variables if needed
+
+        # Instantiate LLM if needed
+```
+- Implement `query` abstract method that is used to get a list of responses from the LLM (call to remote API or local model inference).
+```
+def query(self, query: str, num_responses: int = 1) -> Any:
+    # Support caching 
+    # Call LLM and retrieve list of responses - based on num_responses    
+    # Return LLM response structure (not only raw strings)    
+```
+- Implement `get_response_texts` abstract method that is used to get a list of raw texts from the LLM response structure produced by `query`.
+```
+def get_response_texts(self, query_response: Union[List[Dict], Dict]) -> List[str]:
+    # Retrieve list of raw strings from the LLM response structure    
+```
--- a/graph_of_thoughts/language_models/init.py
+++ b/graph_of_thoughts/language_models/init.py
@ -0,0 +1,3 @@
+from .abstract_language_model import AbstractLanguageModel
+from .chatgpt import ChatGPT
+from .llamachat_hf import Llama2HF
--- a/graph_of_thoughts/language_models/abstract_language_model.py
+++ b/graph_of_thoughts/language_models/abstract_language_model.py
--- a/graph_of_thoughts/language_models/chatgpt.py
+++ b/graph_of_thoughts/language_models/chatgpt.py
--- a/graph_of_thoughts/language_models/config_template.json
+++ b/graph_of_thoughts/language_models/config_template.json
--- a/graph_of_thoughts/language_models/llamachat_hf.py
+++ b/graph_of_thoughts/language_models/llamachat_hf.py
--- a/graph_of_thoughts/operations/operations.py
+++ b/graph_of_thoughts/operations/operations.py
@ -14,7 +14,7 @@ from abc import ABC, abstractmethod
 import itertools

 from graph_of_thoughts.operations.thought import Thought
-from graph_of_thoughts.controller.abstract_language_model import AbstractLanguageModel
+from graph_of_thoughts.language_models import AbstractLanguageModel
 from graph_of_thoughts.prompter import Prompter
 from graph_of_thoughts.parser import Parser