piedomains.llm package

Submodules

piedomains.llm.config module

LLM configuration for domain classification.

class piedomains.llm.config.LLMConfig(provider, model, api_key=None, base_url=None, max_tokens=500, temperature=0.1, categories=None, cost_limit_usd=10.0, usage_tracking=True)[source]

Bases: object

Configuration for LLM-based classification.

provider

LLM provider (e.g., ‘openai’, ‘anthropic’, ‘google’)

model

Model name (e.g., ‘gpt-4o’, ‘claude-3-5-sonnet-20241022’, ‘gemini-1.5-pro’)

api_key

API key for the provider

base_url

Optional base URL for custom endpoints

max_tokens

Maximum tokens for response

temperature

Temperature for response generation

categories

List of classification categories

cost_limit_usd

Maximum cost limit in USD

usage_tracking

Whether to track API usage

provider: str
model: str
api_key: str | None = None
base_url: str | None = None
max_tokens: int = 500
temperature: float = 0.1
categories: list[str] | None = None
cost_limit_usd: float = 10.0
usage_tracking: bool = True
__post_init__()[source]

Validate and set defaults after initialization.

Return type:

None

to_litellm_params()[source]

Convert to litellm parameters.

Return type:

dict[str, Any]

classmethod from_dict(config_dict)[source]

Create LLMConfig from dictionary.

Return type:

LLMConfig

__init__(provider, model, api_key=None, base_url=None, max_tokens=500, temperature=0.1, categories=None, cost_limit_usd=10.0, usage_tracking=True)

piedomains.llm.prompts module

Prompt templates for LLM-based domain classification.

piedomains.llm.prompts.get_classification_prompt(domain, content, categories, max_content_length=8000)[source]

Generate classification prompt for text-only analysis.

Parameters:
  • domain (str) – Domain name to classify

  • content (str) – Extracted text content from the domain

  • categories (list[str]) – List of available categories

  • max_content_length (int) – Maximum length of content to include

Return type:

str

Returns:

Formatted prompt string

piedomains.llm.prompts.get_multimodal_prompt(domain, content=None, categories=None, has_screenshot=False, max_content_length=6000)[source]

Generate classification prompt for multimodal analysis (text + image).

Parameters:
  • domain (str) – Domain name to classify

  • content (str | None) – Extracted text content (optional)

  • categories (list[str] | None) – List of available categories

  • has_screenshot (bool) – Whether a screenshot image is provided

  • max_content_length (int) – Maximum length of content to include

Return type:

str

Returns:

Formatted prompt string

piedomains.llm.prompts.get_custom_prompt(domain, content=None, categories=None, custom_instructions=None, has_screenshot=False)[source]

Generate a custom classification prompt.

Parameters:
  • domain (str) – Domain name to classify

  • content (str | None) – Extracted text content (optional)

  • categories (list[str] | None) – List of available categories

  • custom_instructions (str | None) – Custom classification instructions

  • has_screenshot (bool) – Whether a screenshot image is provided

Return type:

str

Returns:

Formatted prompt string

piedomains.llm.prompts.get_batch_prompt(domains_data, categories)[source]

Generate a batch classification prompt for multiple domains.

Parameters:
  • domains_data (list[dict[str, Any]]) – List of dictionaries with domain info

  • categories (list[str]) – List of available categories

Return type:

str

Returns:

Formatted batch prompt string

piedomains.llm.response_parser module

Response parsing utilities for LLM classification results.

piedomains.llm.response_parser.parse_llm_response(response_text)[source]

Parse LLM response into structured classification result.

Parameters:

response_text (str) – Raw response text from LLM

Return type:

dict[str, Any]

Returns:

Dictionary with parsed classification data

Raises:

ValueError – If response cannot be parsed

piedomains.llm.response_parser.parse_batch_response(response_text)[source]

Parse batch LLM response into list of classification results.

Parameters:

response_text (str) – Raw batch response text from LLM

Return type:

list[dict[str, Any]]

Returns:

List of dictionaries with parsed classification data

Raises:

ValueError – If response cannot be parsed

Module contents

LLM-based classification utilities for piedomains.

class piedomains.llm.LLMConfig(provider, model, api_key=None, base_url=None, max_tokens=500, temperature=0.1, categories=None, cost_limit_usd=10.0, usage_tracking=True)[source]

Bases: object

Configuration for LLM-based classification.

provider

LLM provider (e.g., ‘openai’, ‘anthropic’, ‘google’)

model

Model name (e.g., ‘gpt-4o’, ‘claude-3-5-sonnet-20241022’, ‘gemini-1.5-pro’)

api_key

API key for the provider

base_url

Optional base URL for custom endpoints

max_tokens

Maximum tokens for response

temperature

Temperature for response generation

categories

List of classification categories

cost_limit_usd

Maximum cost limit in USD

usage_tracking

Whether to track API usage

__init__(provider, model, api_key=None, base_url=None, max_tokens=500, temperature=0.1, categories=None, cost_limit_usd=10.0, usage_tracking=True)
__post_init__()[source]

Validate and set defaults after initialization.

Return type:

None

api_key: str | None = None
base_url: str | None = None
categories: list[str] | None = None
cost_limit_usd: float = 10.0
classmethod from_dict(config_dict)[source]

Create LLMConfig from dictionary.

Return type:

LLMConfig

max_tokens: int = 500
temperature: float = 0.1
to_litellm_params()[source]

Convert to litellm parameters.

Return type:

dict[str, Any]

usage_tracking: bool = True
provider: str
model: str
piedomains.llm.get_classification_prompt(domain, content, categories, max_content_length=8000)[source]

Generate classification prompt for text-only analysis.

Parameters:
  • domain (str) – Domain name to classify

  • content (str) – Extracted text content from the domain

  • categories (list[str]) – List of available categories

  • max_content_length (int) – Maximum length of content to include

Return type:

str

Returns:

Formatted prompt string

piedomains.llm.get_multimodal_prompt(domain, content=None, categories=None, has_screenshot=False, max_content_length=6000)[source]

Generate classification prompt for multimodal analysis (text + image).

Parameters:
  • domain (str) – Domain name to classify

  • content (str | None) – Extracted text content (optional)

  • categories (list[str] | None) – List of available categories

  • has_screenshot (bool) – Whether a screenshot image is provided

  • max_content_length (int) – Maximum length of content to include

Return type:

str

Returns:

Formatted prompt string

piedomains.llm.parse_llm_response(response_text)[source]

Parse LLM response into structured classification result.

Parameters:

response_text (str) – Raw response text from LLM

Return type:

dict[str, Any]

Returns:

Dictionary with parsed classification data

Raises:

ValueError – If response cannot be parsed