piedomains.llm package¶

Submodules¶

piedomains.llm.config module¶

LLM configuration for domain classification.

class piedomains.llm.config.LLMConfig(provider, model, api_key=None, base_url=None, max_tokens=500, temperature=0.1, categories=None, cost_limit_usd=10.0, usage_tracking=True)[source]¶

Bases: object

Configuration for LLM-based classification.

provider¶: LLM provider (e.g., ‘openai’, ‘anthropic’, ‘google’)

model¶: Model name (e.g., ‘gpt-4o’, ‘claude-3-5-sonnet-20241022’, ‘gemini-1.5-pro’)

api_key¶: API key for the provider

base_url¶: Optional base URL for custom endpoints

max_tokens¶: Maximum tokens for response

temperature¶: Temperature for response generation

categories¶: List of classification categories

cost_limit_usd¶: Maximum cost limit in USD

usage_tracking¶: Whether to track API usage

provider: str¶

model: str¶

api_key: str | None = None¶

base_url: str | None = None¶

max_tokens: int = 500¶

temperature: float = 0.1¶

categories: list[str] | None = None¶

cost_limit_usd: float = 10.0¶

usage_tracking: bool = True¶

__post_init__()[source]¶

Validate and set defaults after initialization.

Return type:: None

to_litellm_params()[source]¶

Convert to litellm parameters.

Return type:: dict[str, Any]

classmethod from_dict(config_dict)[source]¶

Create LLMConfig from dictionary.

Return type:: LLMConfig

__init__(provider, model, api_key=None, base_url=None, max_tokens=500, temperature=0.1, categories=None, cost_limit_usd=10.0, usage_tracking=True)¶

piedomains.llm.prompts module¶

Prompt templates for LLM-based domain classification.

piedomains.llm.prompts.get_classification_prompt(domain, content, categories, max_content_length=8000)[source]¶

Generate classification prompt for text-only analysis.

Parameters:

domain (str) – Domain name to classify
content (str) – Extracted text content from the domain
categories (list[str]) – List of available categories
max_content_length (int) – Maximum length of content to include

Return type:

str

Returns:

Formatted prompt string

piedomains.llm.prompts.get_multimodal_prompt(domain, content=None, categories=None, has_screenshot=False, max_content_length=6000)[source]¶

Generate classification prompt for multimodal analysis (text + image).

Parameters:

domain (str) – Domain name to classify
content (str | None) – Extracted text content (optional)
categories (list[str] | None) – List of available categories
has_screenshot (bool) – Whether a screenshot image is provided
max_content_length (int) – Maximum length of content to include

Return type:

str

Returns:

Formatted prompt string

piedomains.llm.prompts.get_custom_prompt(domain, content=None, categories=None, custom_instructions=None, has_screenshot=False)[source]¶

Generate a custom classification prompt.

Parameters:

domain (str) – Domain name to classify
content (str | None) – Extracted text content (optional)
categories (list[str] | None) – List of available categories
custom_instructions (str | None) – Custom classification instructions
has_screenshot (bool) – Whether a screenshot image is provided

Return type:

str

Returns:

Formatted prompt string

piedomains.llm.prompts.get_batch_prompt(domains_data, categories)[source]¶

Generate a batch classification prompt for multiple domains.

Parameters:

domains_data (list[dict[str, Any]]) – List of dictionaries with domain info
categories (list[str]) – List of available categories

Return type:

str

Returns:

Formatted batch prompt string

piedomains.llm.response_parser module¶

Response parsing utilities for LLM classification results.

piedomains.llm.response_parser.parse_llm_response(response_text)[source]¶

Parse LLM response into structured classification result.

Parameters:: response_text (str) – Raw response text from LLM
Return type:: dict[str, Any]
Returns:: Dictionary with parsed classification data
Raises:: ValueError – If response cannot be parsed

piedomains.llm.response_parser.parse_batch_response(response_text)[source]¶

Parse batch LLM response into list of classification results.

Parameters:: response_text (str) – Raw batch response text from LLM
Return type:: list[dict[str, Any]]
Returns:: List of dictionaries with parsed classification data
Raises:: ValueError – If response cannot be parsed

Module contents¶

LLM-based classification utilities for piedomains.

class piedomains.llm.LLMConfig(provider, model, api_key=None, base_url=None, max_tokens=500, temperature=0.1, categories=None, cost_limit_usd=10.0, usage_tracking=True)[source]¶

Bases: object

Configuration for LLM-based classification.

provider¶: LLM provider (e.g., ‘openai’, ‘anthropic’, ‘google’)

model¶: Model name (e.g., ‘gpt-4o’, ‘claude-3-5-sonnet-20241022’, ‘gemini-1.5-pro’)

api_key¶: API key for the provider

base_url¶: Optional base URL for custom endpoints

max_tokens¶: Maximum tokens for response

temperature¶: Temperature for response generation

categories¶: List of classification categories

cost_limit_usd¶: Maximum cost limit in USD

usage_tracking¶: Whether to track API usage

__init__(provider, model, api_key=None, base_url=None, max_tokens=500, temperature=0.1, categories=None, cost_limit_usd=10.0, usage_tracking=True)¶

__post_init__()[source]¶

Validate and set defaults after initialization.

Return type:: None

api_key: str | None = None¶

base_url: str | None = None¶

categories: list[str] | None = None¶

cost_limit_usd: float = 10.0¶

classmethod from_dict(config_dict)[source]¶

Create LLMConfig from dictionary.

Return type:: LLMConfig

max_tokens: int = 500¶

temperature: float = 0.1¶

to_litellm_params()[source]¶

Convert to litellm parameters.

Return type:: dict[str, Any]

usage_tracking: bool = True¶

provider: str¶

model: str¶

piedomains.llm.get_classification_prompt(domain, content, categories, max_content_length=8000)[source]¶

Generate classification prompt for text-only analysis.

Parameters:

domain (str) – Domain name to classify
content (str) – Extracted text content from the domain
categories (list[str]) – List of available categories
max_content_length (int) – Maximum length of content to include

Return type:

str

Returns:

Formatted prompt string

piedomains.llm.get_multimodal_prompt(domain, content=None, categories=None, has_screenshot=False, max_content_length=6000)[source]¶

Generate classification prompt for multimodal analysis (text + image).

Parameters:

domain (str) – Domain name to classify
content (str | None) – Extracted text content (optional)
categories (list[str] | None) – List of available categories
has_screenshot (bool) – Whether a screenshot image is provided
max_content_length (int) – Maximum length of content to include

Return type:

str

Returns:

Formatted prompt string

piedomains.llm.parse_llm_response(response_text)[source]¶

Parse LLM response into structured classification result.

Parameters:: response_text (str) – Raw response text from LLM
Return type:: dict[str, Any]
Returns:: Dictionary with parsed classification data
Raises:: ValueError – If response cannot be parsed