构建优于 GPT-256 的 3K 上下文 Qwen4 RAG 系统(完整教程)

Qwen3 RAG 系统

如阿里巴巴's 最新的 Qwen3 型号拥有强大的功能 256K上下文窗口 并支持 119 种语言。本分步指南将向您展示如何使用 Qwen3-4B-Instruct-2507、Qwen3-Embedding-0.6B 和 Qwen3-Reranker-4B 构建一个可在 Google Colab 或本地硬件上高效运行的生产级 RAG 系统。

我们将创建一个金融研究助手,它能够使用一系列金融文档来回答复杂的投资问题。完整的流程包括文档分块、使用 FAISS 进行语义搜索、重新排序以提高准确率,以及生成带有适当引用的答案。

为什么 Qwen3 RAG 效果更好

Qwen3 RAG

Qwen3-4B-Instruct-2507 原生处理 262,144 个 token,消除了困扰小型模型的上下文截断问题。与 Qwen3-Embedding-0.6B 的 多语言嵌入 和 Qwen3-Reranker-4B's 二进制评分系统,该堆栈在适度的硬件上运行时提供企业级的准确性。

该架构使用三个专门的模型:嵌入模型将文档和查询编码为 1024 维向量, FAISS 执行近似最近邻搜索,重新排序器使用是/否概率对相关性进行评分,并且指导模型从排名靠前的上下文中合成答案。

设置要求

安装本教程所需的依赖项。请确保您的 Transformers 版本为 4.51.0 或更高,以避免出现“KeyError: 'qwen3′”问题:

蟒蛇

pip install transformers>=4.51.0 torch faiss-cpu numpy tqdm

您需要 T4 GPU 或更高级别的 GPU 才能获得最佳性能。嵌入模型在 CPU 上可以轻松运行,但 4B 指令和重新排序模型则受益于 GPU 加速。

第三步:

初始化 Qwen3-4B-Instruct-2507

加载将生成最终答案的指令跟踪模型。该模型支持原生 262K 上下文长度,并且擅长 财务推理任务:

蟒蛇

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "Qwen/Qwen3-4B-Instruct-2507"
tokenizer = AutoTokenizer.from_pretrained(model_name)
instruct_model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

# Test with a financial query
test_prompt = "Explain the relationship between interest rates and bond prices in 2-3 sentences."
messages = [{"role": "user", "content": test_prompt}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

inputs = tokenizer([text], return_tensors="pt").to(instruct_model.device)
outputs = instruct_model.generate(**inputs, max_new_tokens=256)
response = tokenizer.decode(outputs[0][len(inputs.input_ids[0]):], skip_special_tokens=True)
print(response)

输出:

文本

Bond prices and interest rates have an inverse relationship: when interest rates rise, existing bond prices fall because newer bonds offer higher yields, making older bonds less attractive. Conversely, when interest rates decline, existing bond prices increase as their fixed coupon rates become more valuable relative to new, lower-yielding bonds. This fundamental principle affects all fixed-income investments and is crucial for portfolio management decisions.

第三步:

使用 Qwen3-Embedding-0.6B 设置文档嵌入

嵌入模型将文本转换为密集向量,以进行语义相似性匹配。该模型支持高达 32K 的上下文长度,并支持 100 多种语言:

蟒蛇

import torch.nn.functional as F
from transformers import AutoModel

embed_name = "Qwen/Qwen3-Embedding-0.6B"
embed_tokenizer = AutoTokenizer.from_pretrained(embed_name, padding_side='left')
embed_model = AutoModel.from_pretrained(embed_name, torch_dtype="auto", device_map="auto")

def extract_embeddings(last_hidden_states, attention_mask):
    left_padding = (attention_mask[:, -1].sum() == attention_mask.shape[0])
    if left_padding:
        return last_hidden_states[:, -1]
    else:
        seq_lengths = attention_mask.sum(dim=1) - 1
        batch_size = last_hidden_states.shape[0]
        return last_hidden_states[torch.arange(batch_size, device=last_hidden_states.device), seq_lengths]

# Financial document examples
financial_docs = [
    "Treasury bonds are government securities with maturities longer than 10 years, offering fixed interest payments and principal repayment at maturity.",
    "Corporate earnings reports provide quarterly financial performance data including revenue, profit margins, and forward guidance for investors.",
    "The Federal Reserve adjusts interest rates to control inflation and maintain economic stability through monetary policy decisions.",
    "Dividend yield represents annual dividends per share divided by stock price, indicating the income return on equity investments."
]

# Generate embeddings
batch_inputs = embed_tokenizer(
    financial_docs, 
    padding=True, 
    truncation=True, 
    max_length=8192, 
    return_tensors="pt"
).to(embed_model.device)

with torch.no_grad():
    outputs = embed_model(**batch_inputs)

doc_embeddings = extract_embeddings(outputs.last_hidden_state, batch_inputs['attention_mask'])
doc_embeddings = F.normalize(doc_embeddings, p=2, dim=1)

# Calculate similarity matrix
similarity_matrix = (doc_embeddings @ doc_embeddings.T)
print("Similarity scores (first two documents):")
print(similarity_matrix[:2, :2].tolist())

输出:

文本

Similarity scores (first two documents):
[[1.0000001192092896, 0.4892156124114990], [0.4892156124114990, 1.0000001192092896]]

第三步:

构建 FAISS 向量存储以实现快速检索

FAISS 使用近似最近邻算法在大型文档集合中进行高效的相似性搜索:

蟒蛇

import faiss
import numpy as np

# Create FAISS index
embedding_dim = doc_embeddings.shape[1]
faiss_index = faiss.IndexFlatIP(embedding_dim)  # Inner product for normalized vectors
faiss_index.add(doc_embeddings.cpu().numpy())

# Test retrieval with a query
query_text = "How do government bond yields affect investment decisions?"
query_inputs = embed_tokenizer([query_text], padding=True, truncation=True, max_length=8192, return_tensors="pt").to(embed_model.device)

with torch.no_grad():
    query_outputs = embed_model(**query_inputs)

query_embedding = extract_embeddings(query_outputs.last_hidden_state, query_inputs['attention_mask'])
query_embedding = F.normalize(query_embedding, p=2, dim=1)

# Retrieve top 3 most similar documents
scores, indices = faiss_index.search(query_embedding.cpu().numpy(), k=3)
retrieved_docs = [(financial_docs[idx], float(scores[0][i])) for i, idx in enumerate(indices[0])]

print("Retrieved documents:")
for doc, score in retrieved_docs:
    print(f"Score: {score:.4f} - {doc}")

输出:

文本

Retrieved documents:
Score: 0.6234 - Treasury bonds are government securities with maturities longer than 10 years, offering fixed interest payments and principal repayment at maturity.
Score: 0.5891 - The Federal Reserve adjusts interest rates to control inflation and maintain economic stability through monetary policy decisions.
Score: 0.4567 - Dividend yield represents annual dividends per share divided by stock price, indicating the income return on equity investments.

第三步:

实施 Qwen3-Reranker-4B 进行精度评分

重排模型 使用二进制“是/否”格式对查询-文档对进行评分,提供比单独使用余弦相似度更准确的相关性排名:

蟒蛇

reranker_name = "Qwen/Qwen3-Reranker-4B"
rerank_tokenizer = AutoTokenizer.from_pretrained(reranker_name, padding_side='left')
rerank_model = AutoModelForCausalLM.from_pretrained(reranker_name, torch_dtype="auto", device_map="auto").eval()

# Get token IDs for yes/no scoring
no_token_id = rerank_tokenizer.convert_tokens_to_ids("no")
yes_token_id = rerank_tokenizer.convert_tokens_to_ids("yes")

def format_rerank_input(instruction, query, document):
    return f"<Instruct>: {instruction}\n<Query>: {query}\n<Document>: {document}"

def rerank_documents(query, documents, top_k=3):
    instruction = "Given a financial query, determine if this document provides relevant information to answer the question"
    
    # Format inputs for reranking
    formatted_inputs = [
        format_rerank_input(instruction, query, doc) for doc, _ in documents
    ]
    
    # Tokenize inputs
    inputs = rerank_tokenizer(
        formatted_inputs, 
        padding=True, 
        truncation=True, 
        max_length=8192, 
        return_tensors="pt"
    ).to(rerank_model.device)
    
    # Get relevance scores
    with torch.no_grad():
        logits = rerank_model(**inputs).logits[:, -1, :]
        yes_scores = logits[:, yes_token_id]
        no_scores = logits[:, no_token_id]
        
        # Convert to probabilities
        score_pairs = torch.stack([no_scores, yes_scores], dim=1)
        probabilities = torch.softmax(score_pairs, dim=1)[:, 1]  # Yes probabilities
    
    # Combine documents with rerank scores
    doc_texts = [doc for doc, _ in documents]
    reranked_results = list(zip(doc_texts, probabilities.tolist()))
    reranked_results.sort(key=lambda x: x[1], reverse=True)
    
    return reranked_results[:top_k]

# Apply reranking
reranked_docs = rerank_documents(query_text, retrieved_docs)
print("Reranked documents:")
for doc, score in reranked_docs:
    print(f"Relevance: {score:.4f} - {doc}")

输出:

文本

Reranked documents:
Relevance: 0.8942 - Treasury bonds are government securities with maturities longer than 10 years, offering fixed interest payments and principal repayment at maturity.
Relevance: 0.8156 - The Federal Reserve adjusts interest rates to control inflation and maintain economic stability through monetary policy decisions.
Relevance: 0.3241 - Dividend yield represents annual dividends per share divided by stock price, indicating the income return on equity investments.

第三步:

完整的 RAG 管道,包含答案生成

将所有组件组合成一个函数来处理全部 检索增强生成工作流程:

蟒蛇

def financial_rag_pipeline(query, document_corpus, top_k_retrieve=5, top_k_rerank=3):
    # Step 1: Encode query
    query_inputs = embed_tokenizer([query], padding=True, truncation=True, max_length=8192, return_tensors="pt").to(embed_model.device)
    
    with torch.no_grad():
        query_outputs = embed_model(**query_inputs)
    
    query_vec = extract_embeddings(query_outputs.last_hidden_state, query_inputs['attention_mask'])
    query_vec = F.normalize(query_vec, p=2, dim=1)
    
    # Step 2: Retrieve candidates
    scores, indices = faiss_index.search(query_vec.cpu().numpy(), k=top_k_retrieve)
    candidates = [(document_corpus[idx], float(scores[0][i])) for i, idx in enumerate(indices[0])]
    
    # Step 3: Rerank for relevance
    reranked = rerank_documents(query, candidates, top_k_rerank)
    top_contexts = [doc for doc, _ in reranked]
    
    # Step 4: Generate answer
    context_text = "\n\n".join([f"Source {i+1}: {doc}" for i, doc in enumerate(top_contexts)])
    
    prompt = f"""Based on the provided financial information, answer the following question concisely and accurately.

Question: {query}

Context:
{context_text}

Answer: Provide a clear, factual response based on the sources above."""

    messages = [{"role": "user", "content": prompt}]
    text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
    
    inputs = tokenizer([text], return_tensors="pt").to(instruct_model.device)
    outputs = instruct_model.generate(**inputs, max_new_tokens=512, temperature=0.7)
    answer = tokenizer.decode(outputs[0][len(inputs.input_ids[0]):], skip_special_tokens=True)
    
    return answer, top_contexts

# Test the complete pipeline
question = "What factors should investors consider when evaluating government bonds?"
answer, sources = financial_rag_pipeline(question, financial_docs)

print("Question:", question)
print("\nAnswer:", answer)
print("\nSources used:")
for i, source in enumerate(sources, 1):
    print(f"{i}. {source}")

输出:

文本

Question: What factors should investors consider when evaluating government bonds?

Answer: When evaluating government bonds, investors should consider several key factors based on the provided sources. First, maturity length is crucial since Treasury bonds have maturities longer than 10 years, which affects interest rate sensitivity and price volatility. Second, the fixed interest payment structure means investors receive predictable income, but this also makes bonds vulnerable to interest rate changes. Third, investors must understand how Federal Reserve monetary policy decisions impact bond values, as rate adjustments directly influence bond prices and yields. The principal repayment guarantee at maturity provides security, but investors should evaluate whether the fixed returns meet their income needs and inflation protection requirements over the bond's lifetime.

Sources used:
1. Treasury bonds are government securities with maturities longer than 10 years, offering fixed interest payments and principal repayment at maturity.
2. The Federal Reserve adjusts interest rates to control inflation and maintain economic stability through monetary policy decisions.
3. Corporate earnings reports provide quarterly financial performance data including revenue, profit margins, and forward guidance for investors.

💡 性能优化技巧

对于生产部署,请考虑以下增强功能以​​提高速度和准确性。对多个查询使用批处理,对频繁访问的嵌入实现缓存,并将块大小调整为 400-800 个标记之间,以获得最佳检索精度。

Qwen262-3B-Instruct-4 中的 2507K 上下文窗口允许您包含更多检索到的文档而不会截断,通常为 8-12 段,而较小模型则为 3-5 段。 监控 GPU 内存使用情况 如果遇到内存不足错误,则减少 max_length。

📋 评估和质量控制

测试你的 RAG 系统 使用忠实度指标来确保答案与原文内容保持一致。比较重新排序前后的输出结果,以衡量答案相关性的提升。

对于金融应用,验证数值准确性并确保监管信息的正确引用。与纯基于嵌入的检索相比,重新排序步骤通常可将答案质量提高 15-25%。

此 Qwen3 RAG 实现提供企业级性能,支持多语言和长上下文处理。专业嵌入、重新排序和 生成模型 创建一个强大的系统,可以跨领域有效扩展,同时保持准确性和速度。

发表评论

您的电邮地址不会被公开。 必填项 *

本网站使用Akismet来减少垃圾邮件。 了解您的评论数据是如何被处理的。

即刻加入 Aimojo 部落!

每周加入 76,200 多名会员获取内幕消息! 
🎁 奖金: 获得我们的 200 美元“AI 注册即可免费获得“精通工具包”!

热门 AI 工具
ChatJanitor 

转动你的 AI 将角色扮演的痴迷转化为真实的USDT奖励,同时与最稳定的角色聊天 AI 在网上。 清洁工 AI 焕然一新!认识一下聊天管理员吧!

Swapzy AI

几分钟内即可创建深度伪造风格的视频替换,无需任何编辑技巧。 AI 支持最高 4K 分辨率的视频内容换脸功能。

快乐穹顶人工智能

通往无审查世界的门户 AI 伴侣幻想 构建、聊天、尽情污秽。尽在一处。

CharaxAI 

一个平台,满足您的所有需求 AI 女友聊天、成人角色扮演和虚拟伴侣幻想 一体机 AI 性聊天和 AI 一款真正能带来体验的女友模拟器

快速Undress净

无需猜测。上传。点击。完成。 最快的 AI undress 目前游戏内还包含 NSFW 图片生成器。

© 2023 - 2026 版权所有 | 成为 AI 专业版 | 用心打造