auto_research.survey.prompts module

class SurveyPrompt[source]

Bases: LLM_utils.prompter.PromptBase

A class for generating prompts to automatically survey research articles.

This class provides methods to generate various types of prompts for extracting and analyzing different sections of research papers, including abstract, introduction, discussion, conclusion, and algorithms. It also supports summarizing and explaining computer science papers.

general_text_cleaning

A list of text cleaning instructions applied across all prompt types.

Type:

list[str]

Example

>>> prompt = SurveyPrompt()
>>> prompt.extract_abstract("Sample paper text...")
>>> print(prompt.prompt)  # Access the generated prompt
general_text_cleaning: list[str] = ['Your answer should not include any references marks, author information, page number, publisher information, figure captions, or table captions.', 'Your answer should not paraphrase or summarize the text, but should be a direct extraction of text.']
extract_abstract(raw_text)[source]

Generate a prompt for extracting the abstract from raw text.

Parameters:

raw_text (str) – The raw text extracted from the first 2 pages of a PDF file.

Return type:

None

Example

>>> prompt = SurveyPrompt()
>>> prompt.extract_abstract("Paper text including abstract...")
extract_introduction(raw_text)[source]

Generate a prompt for extracting the introduction from raw text.

Parameters:

raw_text (str) – The raw text extracted from the first 5 pages of a PDF file.

Return type:

None

Example

>>> prompt = SurveyPrompt()
>>> prompt.extract_introduction("Paper text including introduction...")
extract_discussion(raw_text)[source]

Generate a prompt for extracting the discussion section from raw text.

Parameters:

raw_text (str) – The raw text extracted from the last 3 pages of a PDF file.

Return type:

None

Example

>>> prompt = SurveyPrompt()
>>> prompt.extract_discussion("Paper text including discussion...")
extract_conclusion(raw_text)[source]

Generate a prompt for extracting the conclusion from raw text.

Parameters:

raw_text (str) – The raw text extracted from the last 3 pages of a PDF file.

Return type:

None

Example

>>> prompt = SurveyPrompt()
>>> prompt.extract_conclusion("Paper text including conclusion...")
extract_algorithm(raw_text)[source]

Generate a prompt for extracting algorithms from raw text.

Parameters:

raw_text (str) – The raw text extracted from the first 12 pages of a PDF file.

Return type:

None

Example

>>> prompt = SurveyPrompt()
>>> prompt.extract_algorithm("Paper text including algorithms...")
explain_algorithm(paper, algorithm)[source]

Generate a prompt to explain an algorithm using paper text.

Parameters:
  • paper (str) – The text from the first 12 pages of the paper.

  • algorithm (str) – The algorithm text to be explained.

Return type:

None

Example

>>> prompt = SurveyPrompt()
>>> prompt.explain_algorithm("Paper text...", "Algorithm description...")
summarize_default_computer_science(abstract, introduction, discussion, conclusion)[source]

Generate a prompt to comprehensively summarize a computer science paper.

Parameters:
  • abstract (str) – The paper’s abstract text.

  • introduction (str) – The paper’s introduction text.

  • discussion (str) – The paper’s discussion text.

  • conclusion (str) – The paper’s conclusion text.

Return type:

None

Example

>>> prompt = SurveyPrompt()
>>> prompt.summarize_default_computer_science(
...     "Abstract text...",
...     "Introduction text...",
...     "Discussion text...",
...     "Conclusion text...",
... )
explain_default_computer_science(abstract, introduction, discussion, conclusion, user_question, past_response)[source]

Generate a prompt to comprehensively explain a computer science paper. This method keeps the user’s question and the LLM’s response from the past conversation.

Parameters:
  • abstract (str) – The paper’s abstract text.

  • introduction (str) – The paper’s introduction text.

  • discussion (str) – The paper’s discussion text.

  • conclusion (str) – The paper’s conclusion text.

  • user_question (str) – The user’s question about the paper.

  • past_response (Optional[str]) – The LLM’s response from the past conversation.

Return type:

None

Example

>>> prompt = SurveyPrompt()
>>> prompt.explain_default_computer_science(
...     "Abstract text...",
...     "Introduction text...",
...     "Discussion text...",
...     "Conclusion text...",
...     "User question...",
...     "Past response...",
... )
information_retrieval(raw_extracted_text, designated_information)[source]

Generate a prompt to retrieve specific information from raw extracted text.

Parameters:
  • raw_extracted_text (str) – The raw text extracted from a PDF file.

  • designated_information (str) – The specific information the user wants to retrieve.

Return type:

None

Example

>>> prompt = SurveyPrompt()
>>> prompt.information_retrieval("Raw text...", "Designated information...")
__init__()

Initialize a new PromptBase instance with an empty prompt list and conversation history.

Example

>>> prompt_base = PromptBase()
>>> prompt_base.prompt
None
>>> prompt_base.conversation_history
[]
Return type:

None

conversation_prompting(sequence_assembler, current_input)

This method is a template for assembling the prompts for conversation-based methods for sequence generation. Conversation-based methods generate a new output sequence by interacting with the user and the LLM in a conversational manner.

Formula:

Y_i = f_lambda(Phi_i(X_<=i, Y<i))

Explanation:
  • Y_i: The output sequence generated in the current (`i`th) iteration.

  • f_lambda: The LLM.

  • Phi_i (sequence_assembler): Combines:
    • X_<=i: Input sequences from all iterations (inputs).

    • Y<i: Output sequences from all previous iterations (previous_outputs).

Parameters:
  • sequence_assembler (callable) – A function that takes all inputs and previous outputs and assembles the input for the current iteration.

  • current_input (str) – The raw current input sequence provided for the current iteration.

Returns:

The assembled input sequence for the current iteration.

Return type:

str

Example

>>> def example_sequence_assembler(inputs, past_outputs):
...     return " | ".join(inputs) + " | " + " | ".join(past_outputs)
...
>>> processor = PromptBase()
>>> processor.input_history = ["X0"]
>>> processor.output_history = ["Y1", "Y2"]
>>> current_input = "X_current"
>>> assembled_input = processor.conversation_prompting(
...     sequence_assembler=example_sequence_assembler,
...     current_input=current_input
... )
>>> print(assembled_input)
X0 | X_current | Y1 | Y2
static formatted_to_string_OpenAI(formatted_prompt)

Convert a formatted prompt (list of dictionaries with ‘role’ and ‘content’ keys) into a single concatenated string.

This method takes a list of prompt segments in the OpenAI format and combines them into a single string. The ‘role’ is ignored, and only the ‘content’ is used.

Parameters:

formatted_prompt (list[dict[str, str]]) – A list of dictionaries where each dictionary contains ‘role’ and ‘content’ keys.

Returns:

A single concatenated string of all the ‘content’ values.

Return type:

str

Example

>>> base = PromptBase()
>>> formatted_prompt = [
...     {"role": "system", "content": "First prompt"},
...     {"role": "user", "content": "Second prompt"}
... ]
>>> result = base.formatted_to_string_OpenAI(formatted_prompt)
>>> print(result)
'First prompt\nSecond prompt'
general_interactive_prompting(sequence_assembler, general_evaluator, current_input)

This method is a template for assembling the prompts for interactive methods in the most general case for sequence generation. It can be considered as a combination of iterative_prompting and conversation_prompting.

The method hasn’t been tested yet.

Compared to iterative_prompting, it additionally allows other past input sequences, evaluation of the input sequences, and the current input sequence as elements to be assembled.

Compared to conversation_prompting, it additionally allows the evaluation of the sequences.

Formula:

Y_i = f_lambda(Phi_i(X_<=i, Y_<i, E(X_<=i, Y_<i))))

Explanation:
  • Y_i: The output sequence generated in the current (`i`th) iteration.

  • f_lambda: The LLM.

  • Phi_i (sequence_assembler): Combines:
    • X_<=i: Input sequences from all iterations (inputs).

    • Y_<i: Output sequences from all previous iterations (previous_outputs).

    • E(X_<=i, Y_<i): Evaluations of the output sequences from previous iterations and

    input sequences from all iterations.

Parameters:
  • sequence_assembler (callable) – A function that takes all inputs, previous outputs, and their evaluations, and assembles the input for the current iteration.

  • general_evaluator (callable) – A function that evaluates all inputs and past outputs, returning evaluations for each as a list of two elements.

  • current_input (str) – The raw current input sequence provided for the current iteration.

Returns:

The assembled input sequence for the current iteration.

Return type:

str

Example

>>> def example_sequence_assembler(inputs, past_outputs, evaluations):
...     inputs_eval, outputs_eval = evaluations
...     return " | ".join(inputs) + " | " + " | ".join(past_outputs) + " | " + " | ".join(map(str, inputs_eval)) + " | " + " | ".join(map(str, outputs_eval))
...
>>> def example_general_evaluator(inputs, outputs):
...     inputs_eval = [len(inp) for inp in inputs]
...     outputs_eval = [len(out) for out in outputs]
...     return [inputs_eval, outputs_eval]
...
>>> processor = PromptBase()
>>> processor.input_history = ["X0"]  # Initial input history
>>> processor.output_history = ["Y1", "Y2"]  # Outputs from previous iterations
>>> current_input = "X_current"
>>> assembled_input = processor.general_interactive_prompting(
...     sequence_assembler=example_sequence_assembler,
...     general_evaluator=example_general_evaluator,
...     current_input=current_input
... )
>>> print(assembled_input)
X0 | X_current | Y1 | Y2 | 2 | 9 | 2 | 2
iterative_prompting(sequence_assembler, output_evaluator)

This method is a template for assembling the prompts for iterative methods for sequence generation. Iterative method generates a new output sequence by assembling the input sequence for the current iteration based on previous iterations’ outputs and their evaluations.

The method hasn’t been tested yet.

Formula:

Y_i = f_lambda(Phi_i(X_0, Y_<i, E(Y_<i)))

Explanation:
  • Y_i is the output sequence generated in the current (`i`th) iteration.

  • f_lambda is the LLM.

  • Phi_i (sequence_assembler) is the mechanism that combines:
    • X_0: The initial input sequence

    • Y_<i: Output sequences from all previous iterations

    • E(Y_<i): Evaluations of the output sequences from previous iterations

  • The assembled input (assembled_input) from Phi_i is passed to f_lambda to generate the next output.

Parameters:
  • sequence_assembler (callable) – A function that takes the initial input, the list of previous outputs, and the list of evaluations of previous outputs, and assembles the input for the current iteration.

  • output_evaluator (callable) – A function that takes the list of outputs and produces evaluations for them.

Returns:

The assembled input sequence for the current iteration.

Return type:

str

Example

>>> def example_sequence_assembler(initial_input, previous_outputs, evaluations):
...     return initial_input + " | " + " | ".join(previous_outputs) + " | " + " | ".join(map(str, evaluations))
...
>>> def example_output_evaluator(outputs):
...     return [len(output) for output in outputs]  # Simple evaluation: output length
...
>>> processor = PromptBase()
>>> processor.input_history = ["X0"]  # Initial input
>>> processor.output_history = ["Y1", "Y2"]  # Outputs from previous iterations
>>> assembled_input = processor.iterative_prompting(
...     sequence_assembler=example_sequence_assembler,
...     output_evaluator=example_output_evaluator
... )
>>> print(assembled_input)
X0 | Y1 | Y2 | 2 | 2
static list_to_formatted_OpenAI(prompt_as_list)

Format a list of prompt strings into formats compatible with OpenAI interface.

This method takes a list of strings and converts them into the format expected by GPT-style models, where each prompt segment is represented as a dictionary with ‘role’ and ‘content’ keys. For all segments after the first one, a newline character is prepended to the content.

Parameters:

prompt_as_list (list[str]) – A list of strings representing the prompt segments to be formatted.

Returns:

A list of dictionaries where each dictionary contains

’role’ and ‘content’ keys.

Return type:

list[dict[str, str]]

Example

>>> base = PromptBase()
>>> result = base.list_to_formatted_OpenAI(["First prompt", "Second prompt"])
>>> result[0]["content"]
'First prompt'
>>> result[1]["content"]
'\nSecond prompt'
print_prompt()

Print the content of all formatted prompts.

This method iterates through the stored prompts and prints their content to standard output.

Example

>>> base = PromptBase()
>>> base.prompt = [{"role": "system", "content": "Test prompt"}]
>>> base.print_prompt()
Test prompt
Return type:

None

static sequence_assembler_default(inputs, previous_outputs)

A default sequence assembler that concatenates the input and previous outputs for conversation-based methods with formatted separators and spacing.

Parameters:
  • inputs (list[str]) – The input sequences from all iterations, including the current input.

  • previous_outputs (list[str]) – The output sequences from all previous iterations.

Returns:

The assembled input sequence for the current iteration in a nicely formatted string.

Return type:

str

Example

>>> inputs = ["What did I just say?", "Hi, my name is Tom"]
>>> previous_outputs = ["Nice to meet you, Tom"]
>>> result = PromptBase.sequence_assembler_default(inputs, previous_outputs)
>>> print(result)
------ From the user: ------
Hi, my name is Tom

—— Your response: —— Nice to meet you, Tom

—— From the user: —— What did I just say?