3.2 Building Custom Python Agents#

Now let’s see how we can enhance the capabilities of LLMs with added tools. Here we focus on how to deploy a flexible, open-source, python-based agent. Here we will be using Aviary, an extensible gymnasium for defining agent environments and LDP a framework for defining language agents. With these packages you have more freedom to customize agents and use opensource language models.

All FutureHouse agents including PaperQA2 and Finch are implemented using Aviary and LDP.

rag_framework
Figure 3.2.1: An agent iteratively received observations from the environment, takes an action based on the observation until the task is completed.
🚀 How to run the notebook

This tutorial can be launched using the rocket (🚀) button at the top of the page.

Option 2 — MyBinder#

Launches a temporary cloud Jupyter environment directly in your browser.

⚠️ Binder environments can take a few minutes to build and start.

After the notebook loads, create a .env file in the notebook directory containing your API keys:

OPENAI_API_KEY=your_key_here
ANTHROPIC_API_KEY=your_key_here

Notes#

  • You only need API keys for the providers used in a given notebook.

  • Never commit or publicly share your API keys.

  • If a cell fails due to missing credentials, verify that your keys were loaded correctly before rerunning the cell.

3.2.1 Problem setup#

The goal of this tutorial is to build an AI agent that can analyze protein data and generate hypotheses for drug discovery.

When you understand the basic workflow, you will be able to design and build your own agents.

If you’re using a Google Colab notebook, you can install the requirements by running the cell below.

!uv pip install fhaviary ldp pydantic openai biopython==1.86

Next, you have to setup an OpenAI API key. If you’re using an opensource LLM, then you should replace the OpenAI client and model names in the following cells.

import os

LLM_API_KEYS = {
    "openai":    "OPENAI_API_KEY",
    "anthropic": "ANTHROPIC_API_KEY",
}

def get_api_key(llm: str = "openai") -> str:
    """
    Load API key for the specified LLM from Colab secrets,
    environment variable, or user input.
    
    Args:
        llm: LLM provider name. eg: 'openai', 'anthropic'
    
    Returns:
        API key string
    
    Example:
        api_key = get_api_key("anthropic")
    """

    llm = llm.lower()
    if llm not in LLM_API_KEYS:
        raise ValueError(
            f"Unknown LLM '{llm}'. Choose from: {list(LLM_API_KEYS.keys())}"
        )

    env_var = LLM_API_KEYS[llm]

    # 1. Try Colab secrets
    try:
        from google.colab import userdata
        key = userdata.get(env_var)
        if key:
            return key
    except ImportError:
        pass

    # 2. Try environment variable / .env file
    try:
        from dotenv import load_dotenv
        load_dotenv()
        key = os.environ.get(env_var)
        if key:
            return key
    except ImportError:
        pass

    raise ValueError(
        f"API key not found. Please set {env_var}:\n"
        f"  export {env_var}='your-key-here'\n"
        f"  or add it to a .env file"
    )

# Set the API key as an environment variable
os.environ["OPENAI_API_KEY"] = get_api_key("openai")

3.2.2 Define the tools#

As we discussed previously, tools are Python functions our agent can choose to call. Each tool is a python function.

We give each tool a name, a description (so the AI understands what it does), and a list of inputs it expects. In this example we’ll build two tools:

  • analyze_protein_sequence — computes basic biophysical properties

  • summarize_protein_role — asks the AI to summarize biological context. We’ll use an OpenAI model to do this, but you can also use an opensource LLM here.

An important note to add is that when you define a tool with Aviary it must contain a docstring with function description. See the example tool definitions below.

Here are a few brief definitions of key classes and concepts from Aviary and LDP for referral:

From Aviary

  • Message: Used by language agents and environments for communication. Messages include attributes like content ot role (system, user, assistant, tool ), matching OpenAI’s conventions.

  • Environment: An environment is a stateful system or “world” where an agent operates by taking actions. In Aviary, these actions are called tools. The environment presents states that the agent observes (totally or partially), prompting it to use tools to affect outcomes. Each action taken yields a reward and leads to a new state.

  • Tool: Defines an environmental tool that an agent can use to accomplish its task. Each environment contains its own set of tools. Most tools take arguments and tools can be called in parallel.

  • ToolRequestMessage: This is a specialized subclasses of Message used for tool requests. Typically, a language agent sends a ToolRequestMessage to the environment to request the execution of a specific tool. The role of ToolRequestMessage is always assistant.

From LDP

  • Agent: An entity that interacts with the environment, mapping observations to tool request actions.

  • Op: Represents an operation within the agent. LDP includes various operations (Ops), such as API LLM calls, API embedding calls, or PyTorch module handling. These operations form the compute graph.

  • OpResult: the output of an Op.

Now let’s write the python functions to define the two tools.

from Bio.SeqUtils.ProtParam import ProteinAnalysis
from openai import OpenAI

# ── TOOL 1: Analyze a protein sequence ──────────────────────────────────────

def analyze_protein_sequence(sequence: str) -> dict:
    """
    A tool to analyze a protein sequence.
    Use when you need to get basic biophysical properties of a protein.
    eg: molecular weight, isoelectric point, instability index, gravy score, etc.

    Args:
        sequence: The protein sequence to analyze
    Returns:
        A dictionary containing the biophysical properties of the protein.
    """

    sequence = sequence.upper().strip()
    analysis = ProteinAnalysis(sequence)

    results = {
        "length":              len(sequence),
        "molecular_weight_Da": round(analysis.molecular_weight(), 2),
        "isoelectric_point":   round(analysis.isoelectric_point(), 2),
        "instability_index":   round(analysis.instability_index(), 2),
        "gravy_score":         round(analysis.gravy(), 3),   # hydrophobicity
        "amino_acid_percent":  {
            aa: round(pct, 1)
            for aa, pct in analysis.amino_acids_percent.items()
            if pct > 0   # only show amino acids actually present
        },
    }

    # Interpret some values for the non-expert
    results["is_stable"]     = results["instability_index"] < 40
    results["is_hydrophilic"] = results["gravy_score"] < 0

    return results


# ── TOOL 2: Summarize protein biological role ────────────────────────────────

def summarize_protein_role(protein_name: str, organism: str, protein_data: dict | None = None) -> str:
    """
    A tool to summarize the biological
    role of a protein from its training knowledge.
    eg: biological function, disease or condition it is associated with, why it is considered a drug target.

    Args:
        protein_name: The name of the protein to summarize
        organism: The organism the protein belongs to
        protein_data: A dictionary containing the protein data
    Returns:
        A string containing the summary of the protein's biological role.
    """

    client = OpenAI(api_key=get_api_key(llm="openai"))
    response = client.responses.create(
    model="gpt-4.1-nano-2025-04-14",
    input= (
                f"Provide a concise 3–4 sentence summary of the protein '{protein_name}' "
                f"in {organism}. Here is the protein data: {protein_data}\n. Cover: (1) its biological function, "
                f"(2) which disease or condition it is associated with, "
                f"(3) why it is considered a drug target. Be factual and concise."
            ),
    )
    
    return response.output_text

def submit_final_answer(answer: str) -> str:  # noqa: RUF029
    """
    A tool to submit the final answer to the user.

    Args:
        answer: The answer to the query.
    Returns:
        True if the answer is submitted, False otherwise
    """

    return answer

3.2.3 Define the environment#

Next we define a simple state and environment where an agent takes actions to modify analyze a protein.

💡 Reminders

The State is a snapshot of the agent’s current situation. ie. what the agent knows at a given timestep.

The Environment is everything outside the agent itself — it’s the world the agent perceives and acts upon.

from typing import cast
from aviary.core import (
    Environment,
    Message,
    Messages,
    Tool,
    ToolRequestMessage,
    ToolResponseMessage,
)
from pydantic import BaseModel

SYSTEM_PROMPT = """
You are an expert researcher. You are given a research question, Your task is to answer the question. You have access to the following tools:
- analyze_protein_sequence: to analyze the protein
- summarize_protein_role: to summarize the biological role of the protein
- submit_final_answer: to submit the final answer

Prompt: \n{query}\n
"""

class DemoEnvState(BaseModel):
    """State of the EvalAgent."""

    query: str
    answer: str | None = None
    done: bool = False

class DemoAgentEnv(Environment[DemoEnvState]):
    """Environment for the DemoAgent."""

    def __init__(
        self,
        query: str, # the input to the agent
    ):
        self.query = query
        self.tools: list[Tool] = []
        self.messages: Messages | None = None

    def make_initial_state(self) -> DemoEnvState:
        """
        This initializes the state of the agent
        i.e., where the agent at the beginning of the task
        you can add more fields to the state if you want
        """
        return DemoEnvState(
            query=self.query,
            answer=None,
            done=False
        )
    
    async def reset(self) -> tuple[Messages, list[Tool]]:
        """
        Reset the environment and collect initial observation(s).
        Possible observations could be instructions 
        on how tools are related,
        or the goal of the environment.
        should return a two-tuple of initial observations and tools
        """
        self.messages = [
            Message(content=SYSTEM_PROMPT, role="system"),
            Message(content=self.query),
        ]
        self.tools = [
            Tool.from_function(analyze_protein_sequence),
            Tool.from_function(summarize_protein_role),
            Tool.from_function(submit_final_answer),
        ]

        self.state = self.make_initial_state()
        return self.messages, self.tools
    
    async def step(
        self, action: ToolRequestMessage
    ) -> tuple[Messages, float, bool, bool]:
        response_messages = cast(
            "Messages",
            await self.exec_tool_calls(
                action,
                concurrency=False,
                handle_tool_exc=True,
                state=self.state,
            ),
        ) or [Message(content=f"No tool calls input in tool request {action}.")]

        done = any(
            isinstance(msg, ToolResponseMessage)
            and msg.name == submit_final_answer.__name__
            for msg in response_messages
        )
        self.intermediate_answer = response_messages[-1].content
        if done:
            self.state.done = True

        return (
            response_messages,
            1 if self.state.done else 0,
            self.state.done,
            False,
        )
   

3.2.4 Initialize the agent#

Now we have setup our environment. Now we have to have an agent (an LLM) to use the tools and come up with an answer. The ldp package has pre-defined Simple and ReAct agents which you can implement. You can refer to this GitHub repo for more details on agent implementation.

from pydantic import BaseModel, Field
from ldp.agent import Agent
from ldp.alg import RolloutManager
from ldp.graph import LLMCallOp

from aviary.core import ToolRequestMessage

from aviary.core import Message, Tool


class AgentState(BaseModel):
    """Simple bucket to store available tools and previous messages."""

    tools: list[Tool] = Field(default_factory=list)
    messages: list[Message] = Field(default_factory=list)


class SimpleAgent(Agent):
    def __init__(self, **kwargs: dict) -> None:
        self._llm_call_op = LLMCallOp(**kwargs)

    async def init_state(self, tools: list[Tool]) -> AgentState:
        return AgentState(tools=tools)

    async def get_asv(
        self, agent_state: AgentState, obs: list[Message]
    ) -> tuple[ToolRequestMessage, AgentState, float]:
        """Take an action, observe new state, return value."""
        action: ToolRequestMessage = await self._llm_call_op(
            config={"name": "gpt-4o-mini", "temperature": 0.1},
            msgs=agent_state.messages + obs,
            tools=agent_state.tools,
        )
        new_state: AgentState = AgentState(
            messages=agent_state.messages + obs + [action.value],
            tools=agent_state.tools,
        )
        # Return action, state, value
        return action, new_state, 0.0

Let’s initiate the agent and perform rollouts on the environment!

Note: If max_steps is set, rollouts will be truncated at this value. If a rollout has fewer than max_steps, then a new environment will be constructed and another rollout will be started until max_steps is reached.

# initate the agent
agent = SimpleAgent()
runner = RolloutManager(agent=agent)

# Define the query
query = """What is the biological role of the protein with PDB id 9RIP? Please analyze this protein and help me think about potential small-molecule drug targeting strategies."""

# Perform rollouts
trajectories: list[tuple] = await runner.sample_trajectories(
    environments=[DemoAgentEnv(query=query)], # must be a list of environments. Can add multiple environments to run in parallel
    max_steps = 5, # max number of steps to run for each environment
)

Now let’s print the final answer from the last tool call in the last step. That’s why we’re using steps[-1].

Here there’s only 1 trajectory as we ran the agent in only 1 environment. So it doesn’t matter if we do trajectories[0] or trajectories[-1]. But you can run multiple environments and get the tool calls from each trajectory.

for example we can run 2 agents in parallel with:

trajectories: list[tuple] = await runner.sample_trajectories(
    environments=[
        DemoAgentEnv(query=query_1),
        DemoAgentEnv(query=query_2)
        ],
)
# Print the tool calls from the last step of the last trajectory
tool_calls = trajectories[-1].steps[-1].action.value.tool_calls

# Get the answer from submit_final_answer tool call
for tc in tool_calls:
    if tc.function.name == "submit_final_answer":
        print(tc.function.arguments["answer"])