
Aktiivõpe muudab meie koolitusviisi AI mudelid valides märkmete tegemiseks arukalt kõige väärtuslikumad andmed. Koos sellega võimsad õigusteaduse kraadid nagu Google Gemini, see loob tõhusad annotatsioonikanalid, mis vähendavad käsitsi tehtavat tööd, säilitades samal ajal kõrge andmekvaliteedi.
See juhend uurib, kuidas selliseid torustikke ehitada, kasutades Adala raamistik – võimas, kuid vähekasutatud tööriist autonoomne andmete märgistamine.
Rakendame meditsiiniliste sümptomite klassifikaatori, mis kasutab Gemini tehnoloogiat.'s võimeid struktureeritud aktiivõppe töövoo kaudu.
Aktiivõppe mõistmine andmete annotatsiooniks

Aktiivõpe lahendab peamise väljakutse juhendatud õppimine: suure hulga märgistatud andmete saamine. Andmepunktide juhusliku valimise asemel märkuste tegemiseks aktiivõppe algoritmid tuvastada kõige informatiivsemad valimid, mis aitavad mudeli täiustamisele kõige rohkem kaasa.
Miks aktiivõpe on oluline:
Adala raamistik toob need eelised sisse tootmise töövood pakkudes modulaarseid komponente, mis lihtsustavad aktiivne õppeprotsessEnne rakendamisega alustamist laskem's uurige, mis teeb Adala eriti sobivaks integratsioon kaasaegsete õigusteaduse magistriõppe programmidega nagu Google Gemini.
Mis on Adala? Sissejuhatus raamistikku

Adala (autonoomne andmete märgistamise agent) on avatud lähtekoodiga raamistik spetsiaalselt loodud spetsiaalsete agentide rakendamiseks andmetöötlusErinevalt traditsioonilistest annotatsioonitööriistadest kasutab Adala agendipõhist lähenemisviisi, mis ühendab endas:
Vaadates Adalat's kiirkäivituse näites näeme, kuidas see üles ehitatakse sentimentide klassifikatsioon:
püüton
import pandas as pd
from adala.agents import Agent
from adala.environments import StaticEnvironment
from adala.skills import ClassificationSkill
from adala.runtimes import OpenAIChatRuntime
from rich import print
# Train dataset
train_df = pd.DataFrame([
["It was the negative first impressions, and then it started working.", "Positive"],
["Not loud enough and doesn't turn on like it should.", "Negative"],
["I don't know what to say.", "Neutral"],
["Manager was rude, but the most important that mic shows very flat frequency response.", "Positive"],
["The phone doesn't seem to accept anything except CBR mp3s.", "Negative"],
["I tried it before, I bought this device for my son.", "Neutral"],
], columns=["text", "sentiment"])
# Test dataset
test_df = pd.DataFrame([
"All three broke within two months of use.",
"The device worked for a long time, can't say anything bad.",
"Just a random line of text."
], columns=["text"])
agent = Agent(
# connect to a dataset
environment=StaticEnvironment(df=train_df),
# define a skill
skills=ClassificationSkill(
name='sentiment',
instructions="Label text as positive, negative or neutral.",
labels=["Positive", "Negative", "Neutral"],
input_template="Text: {text}",
output_template="Sentiment: {sentiment}"
),
# define runtimes
runtimes = {
'openai': OpenAIChatRuntime(model='gpt-4o'),
},
teacher_runtimes = {
'default': OpenAIChatRuntime(model='gpt-4o'),
},
default_runtime='openai',
)
agent.learn(learning_iterations=3, accuracy_threshold=0.95)
predictions = agent.run(test_df)
Meditsiiniliste sümptomite klassifitseerimise ülesande jaoks kohandame seda arhitektuuri integreerimiseks Google Gemini samal ajal rakendades kohandatud aktiivõppe strateegiat.
Oma keskkonna seadistamine
Laskma's alusta Adala ja vajalike sõltuvuste installimisega:
püüton
# Install Adala directly from GitHub
!pip install -q git+https://github.com/HumanSignal/Adala.git
# Verify installation
!pip list | grep adala
# Install additional dependencies
!pip install -q google-generativeai pandas matplotlib numpy
Samuti peame hoidla kloonima, et pääseda otse juurde selle komponentidele:
püüton
# Clone the repository for access to source files
!git clone https://github.com/HumanSignal/Adala.git
# Ensure the package is in our Python path
import sys
sys.path.append('./Adala')
# Import key components
from Adala.adala.annotators.base import BaseAnnotator
from Adala.adala.strategies.random_strategy import RandomStrategy
from Adala.adala.utils.custom_types import TextSample, LabeledSample
Google Gemini integreerimine kohandatud annotaatorina
Erinevalt algsest rakendusest, mis kasutas Google Gemini ümber lihtsat ümbrist, ehitame meie selle lihtsama versiooni. tugev annotaator mis järgneb Adalale's disainimustrid. See muudab meie lahenduse hooldatav ja laiendatav.
Esmalt peame seadistama Google Generative'i AI klient:
püüton
import google.generativeai as genai
import os
# Set API key from environment or enter manually
GEMINI_API_KEY = os.getenv("GEMINI_API_KEY") or getpass("Enter your Gemini API Key: ")
genai.configure(api_key=GEMINI_API_KEY)
Nüüd loome kohandatud annotaatori, laiendades Adalat's BaseAnnotatori klass:
püüton
import json
import re
from typing import List, Dict, Any, Optional
class GeminiAnnotator(BaseAnnotator):
"""Custom annotator using Google Gemini for medical symptom classification."""
def __init__(self,
model_name: str = "models/gemini-2.0-flash-lite",
categories: List[str] = None,
temperature: float = 0.1):
"""Initialize the Gemini annotator.
Args:
model_name: The Gemini model to use
categories: List of valid classification categories
temperature: Controls randomness in generation (lower = more deterministic)
"""
self.model = genai.GenerativeModel(
model_name=model_name,
generation_config={"temperature": temperature}
)
self.categories = categories or ["Cardiovascular", "Respiratory",
"Gastrointestinal", "Neurological"]
def _build_prompt(self, text: str) -> str:
"""Create a structured prompt for the model.
Args:
text: The symptom text to classify
Returns:
A formatted prompt string
"""
return f"""Classify this medical symptom into one of these categories:
{', '.join(self.categories)}.
Return JSON format: {{"category": "selected_category",
"confidence": 0.XX, "explanation": "brief_reason"}}
SYMPTOM: {text}"""
def _parse_response(self, response: str) -> Dict[str, Any]:
"""Extract structured data from model response.
Args:
response: Raw text response from Gemini
Returns:
Dictionary containing parsed fields
"""
try:
# Extract JSON from response even if surrounded by text
json_match = re.search(r'(\{.*\})', response, re.DOTALL)
result = json.loads(json_match.group(1) if json_match else response)
return {
"category": result.get("category", "Unknown"),
"confidence": result.get("confidence", 0.0),
"explanation": result.get("explanation", "")
}
except Exception as e:
return {
"category": "Unknown",
"confidence": 0.0,
"explanation": f"Error parsing response: {str(e)}"
}
def annotate(self, samples: List[TextSample]) -> List[LabeledSample]:
"""Annotate a batch of text samples.
Args:
samples: List of TextSample objects
Returns:
List of LabeledSample objects with annotations
"""
results = []
for sample in samples:
prompt = self._build_prompt(sample.text)
try:
response = self.model.generate_content(prompt).text
parsed = self._parse_response(response)
# Create labeled sample with metadata
labeled_sample = LabeledSample(
text=sample.text,
labels=parsed["category"],
metadata={
"confidence": parsed["confidence"],
"explanation": parsed["explanation"]
}
)
except Exception as e:
# Graceful error handling
labeled_sample = LabeledSample(
text=sample.text,
labels="Unknown",
metadata={"error": str(e)}
)
# Store reference to original sample
labeled_sample._sample = sample
results.append(labeled_sample)
return results
See rakendus pakub originaaliga võrreldes olulisi parandusi:
- See järgib Adala klassilt päritud õiget klassipära.'s Baasmärkijate
- Rakendab privaatseid abimeetodeid kiireks loomiseks ja vastuste parsimiseks
- Kasutab struktureeritud veakäsitlus ja tippimisvihjed
- Pakub täielikku dokumentatsiooni
Sümptomite klassifitseerimise torujuhtme loomine
Laskma's loo andmestik meditsiinilised sümptomid meie klassifitseerimisülesande jaoks. Erinevalt algsest rakendusest kasutame mitmekesisemat andmestikku koos tasakaalustatud esindatus kategooriate lõikes:
püüton
# Create a more comprehensive dataset
symptom_data = [
# Cardiovascular symptoms
"Chest pain radiating to left arm during exercise",
"Heart palpitations when lying down",
"Swollen ankles and shortness of breath",
"Dizziness when standing up quickly",
# Respiratory symptoms
"Persistent dry cough with occasional wheezing",
"Shortness of breath when climbing stairs",
"Coughing up yellow or green mucus",
"Rapid breathing with chest tightness",
# Gastrointestinal symptoms
"Stomach cramps and nausea after eating",
"Burning sensation in upper abdomen",
"Frequent loose stools with abdominal pain",
"Yellowing of skin and eyes",
# Neurological symptoms
"Severe headache with sensitivity to light",
"Numbness in fingers of right hand",
"Memory loss and confusion",
"Tremors in hands when reaching for objects"
]
# Convert to TextSample objects
text_samples = [TextSample(text=text) for text in symptom_data]
Täiustatud aktiivõppe strateegiate rakendamine
Algses rakenduses kasutati lihtsat prioriteetide hindamise mehhanismi. Täiustame seda Adala demonstreerimiseks mitme strateegiaga.'s paindlikkus:
püüton
import numpy as np
from typing import List, Callable
class PrioritizationStrategy:
"""Base class for sample prioritization strategies."""
def score_samples(self, samples: List[TextSample]) -> np.ndarray:
"""Assign priority scores to samples.
Args:
samples: List of samples to score
Returns:
Array of scores, higher values indicate higher priority
"""
raise NotImplementedError("Subclasses must implement this method")
def select(self, samples: List[TextSample], n: int = 1) -> List[TextSample]:
"""Select the top n highest scoring samples.
Args:
samples: List of samples to select from
n: Number of samples to select
Returns:
List of selected samples
"""
if not samples:
return []
scores = self.score_samples(samples)
indices = np.argsort(-scores)[:n] # Descending order
return [samples[i] for i in indices]
class KeywordPriority(PrioritizationStrategy):
"""Prioritize samples based on medical urgency keywords."""
def __init__(self, keyword_weights: Dict[str, float]):
"""Initialize with keyword weights.
Args:
keyword_weights: Dictionary mapping keywords to priority weights
"""
self.keyword_weights = keyword_weights
def score_samples(self, samples: List[TextSample]) -> np.ndarray:
scores = np.zeros(len(samples))
for i, sample in enumerate(samples):
# Base score
scores[i] = 0.1
# Add weights for each keyword found
text_lower = sample.text.lower()
for keyword, weight in self.keyword_weights.items():
if keyword in text_lower:
scores[i] += weight
return scores
class UncertaintyPriority(PrioritizationStrategy):
"""Prioritize samples based on model uncertainty."""
def __init__(self, model_fn: Callable[[List[TextSample]], List[float]]):
"""Initialize with uncertainty model function.
Args:
model_fn: Function that returns uncertainty scores for samples
"""
self.model_fn = model_fn
def score_samples(self, samples: List[TextSample]) -> np.ndarray:
# Higher uncertainty = higher priority
return np.array(self.model_fn(samples))
# Create a combined strategy
keyword_weights = {
"chest": 0.5,
"pain": 0.4,
"breathing": 0.4,
"dizz": 0.3,
"head": 0.2,
"numb": 0.2
}
keyword_strategy = KeywordPriority(keyword_weights)
Nüüd las's Rakendage meie täiustatud aktiivõppe tsüklit:
püüton
from matplotlib import pyplot as plt
from IPython.display import clear_output
import time
def run_active_learning_loop(
samples: List[TextSample],
annotator: GeminiAnnotator,
strategy: PrioritizationStrategy,
iterations: int = 5,
batch_size: int = 1,
visualization_interval: int = 1
):
"""Run an active learning loop with visualization.
Args:
samples: Pool of unlabeled samples
annotator: Annotation system
strategy: Sample selection strategy
iterations: Number of learning iterations
batch_size: Samples to annotate per iteration
visualization_interval: How often to update visualizations
Returns:
List of labeled samples
"""
labeled_samples = []
remaining_samples = list(samples)
print("\nStarting Active Learning Loop:")
for i in range(iterations):
print(f"\n--- Iteration {i+1}/{iterations} ---")
# Filter out already labeled samples
remaining_samples = [
s for s in remaining_samples
if s not in [getattr(l, '_sample', l) for l in labeled_samples]
]
if not remaining_samples:
print("No more samples to label. Stopping.")
break
# Select most important samples
selected = strategy.select(remaining_samples, n=batch_size)
# Annotate selected samples
newly_labeled = annotator.annotate(selected)
labeled_samples.extend(newly_labeled)
# Display annotation results
for sample in newly_labeled:
print(f"Text: {sample.text}")
print(f"Category: {sample.labels}")
print(f"Confidence: {sample.metadata.get('confidence', 0):.2f}")
explanation = sample.metadata.get('explanation', '')
print(f"Explanation: {explanation[:100]}..." if len(explanation) > 100 else explanation)
print()
# Visualize results periodically
if (i + 1) % visualization_interval == 0:
visualize_results(labeled_samples)
return labeled_samples
def visualize_results(labeled_samples: List[LabeledSample]):
"""Create visualizations of annotation results.
Args:
labeled_samples: List of labeled samples to visualize
"""
if not labeled_samples:
return
# Extract data
categories = [s.labels for s in labeled_samples]
confidence = [s.metadata.get("confidence", 0) for s in labeled_samples]
texts = [s.text[:30] + "..." for s in labeled_samples]
# Set up plots
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))
# Plot 1: Confidence by category
category_counts = {}
category_confidence = {}
for cat, conf in zip(categories, confidence):
if cat not in category_counts:
category_counts[cat] = 0
category_confidence[cat] = 0
category_counts[cat] += 1
category_confidence[cat] += conf
for cat in category_confidence:
category_confidence[cat] /= category_counts[cat]
cats = list(category_counts.keys())
counts = list(category_counts.values())
avg_conf = list(category_confidence.values())
x = np.arange(len(cats))
width = 0.35
ax1.bar(x - width/2, counts, width, label='Count')
ax1.bar(x + width/2, avg_conf, width, label='Avg Confidence')
ax1.set_xticks(x)
ax1.set_xticklabels(cats, rotation=45)
ax1.set_title('Category Distribution and Confidence')
ax1.legend()
# Plot 2: Individual sample confidence
sorted_indices = np.argsort(confidence)
ax2.barh(range(len(texts)), [confidence[i] for i in sorted_indices])
ax2.set_yticks(range(len(texts)))
ax2.set_yticklabels([texts[i] for i in sorted_indices])
ax2.set_title('Sample Confidence')
ax2.set_xlabel('Confidence')
plt.tight_layout()
plt.show()
Lõpp-otsa torujuhtme käitamine
Nüüd saame käivitada kogu oma aktiivõppe protsessi:
püüton
# Initialize components
categories = ["Cardiovascular", "Respiratory", "Gastrointestinal", "Neurological"]
annotator = GeminiAnnotator(categories=categories)
strategy = keyword_strategy
# Run the active learning loop
labeled_data = run_active_learning_loop(
samples=text_samples,
annotator=annotator,
strategy=strategy,
iterations=5,
visualization_interval=2
)
# Final visualization and analysis
visualize_results(labeled_data)
# Print summary statistics
print("\nAnnotation Summary:")
print(f"Total samples annotated: {len(labeled_data)}")
categories = [s.labels for s in labeled_data]
unique_categories = set(categories)
print(f"Categories found: {len(unique_categories)}")
for category in unique_categories:
count = categories.count(category)
print(f" - {category}: {count} samples ({count/len(labeled_data):.1%})")
avg_confidence = sum(s.metadata.get("confidence", 0) for s in labeled_data) / len(labeled_data)
print(f"Average confidence: {avg_confidence:.2f}")
Praktilised rakendused ja laiendused
Sellel torujuhtmel on meditsiiniliste sümptomite klassifitseerimise kõrval arvukalt praktilisi rakendusi:
1. Sisu modereerimine
2. Klientide tagasiside analüüs
3. Kliinilise uuringu dokumentide töötlemine
Saate seda rakendamist laiendada järgmiselt:
AiMojo soovitab:
Järeldus
Adala ja Google Gemini integratsioon pakub võimas raamistik intelligentsete annotatsioonikanalite loomiseks. Aktiivse kasutamise abil õpistrateegiad, saame oluliselt vähendada vajalikku käsitsi tehtavat tööd, säilitades samal ajal kvaliteetsed märkused.
Selles õpetuses demonstreeritud modulaarsed disainimustrid võimaldavad lihtne kohanemine erinevatele domeenidele ja annotatsiooniülesannetele.
Neile, kes on huvitatud lähemalt uurimisest, Adala GitHubi hoidla pakub täiendavaid näiteid ja dokumente nende kontseptsioonide laiendamiseks keerulised märkuste stsenaariumid.

