오픈소스로 Reasonging RAG 구축하기

NLP | LLM

오픈소스로 Reasonging RAG 구축하기

삐롱K 2025. 7. 23. 01:04

Github CODE
다음 강의를 보고 정리한 내용입니다.

Keyword: Reasoning RAG, Langchain, Ollama, QdrantDB, Docling

1. 라이브러리 설치

VectorDB는 QdrantDB를 사용했다. 실무에서 생각했을 때 유지보수가 용이한게 가장 중요하다. 어떤 collection을 넣었는지, 어떤 문서들이 어떤 chunking 과정을 통해 들어가있는지 웹 UI를 통해 확인 가능하다.
Docling은 Entity를 모두 파싱 가능하고, 쉽게 LLM이해할 수 있도록 마크다운으로 변환하는 라이브러리

pip install -q langchain langgraph langchain-docling langchain-qdrant langchain-text-splitters langchain-ollama

2. 모델 로드

추론 모델과 답변 모델을 분리해서 사용

from langchain_ollama import ChatOllama

reasoning_llm = ChatOllama(
    model="deepseek-r1:7b", # 추론 모델
    stop=["</think>"]
)

answer_llm = ChatOllama(
    model = "exaone3.5", # 답변 모델
    temperature = 0
)

3. RAG state 정의

from typing import Annotated, List, TypedDict, Literal
from langgraph.graph.message import add_messages
from langchain_core.documents import Document

# RAG 상태 정의
class RAGState(TypedDict):
    """RAG 시스템의 상태를 정의합니다."""
    query: str
    thinking: str
    documents: List[Document]
    answer: str
    messages: Annotated[List, add_messages]
    mode: str

4. 문서 파싱

Docling 사용
복잡한 표도 나름 잘 추출하는 것 같다.

from langchain_docling import DoclingLoader
from langchain_docling.loader import ExportType

FILE_PATH = "https://arxiv.org/pdf/2408.09869"

# 시간 오래 걸림
# 내부적으로 레이아웃을 파싱하고, LLM이 이해하기 쉽게 Rule-Based로 수행.
loader = DoclingLoader(
    file_path = FILE_PATH,
    export_type = ExportType.MARKDOWN
)

docs = loader.load()

5. Chunking

from langchain_text_splitters import MarkdownHeaderTextSplitter

splitter = MarkdownHeaderTextSplitter(
    headers_to_split_on=[
        ("#", "Header_1"),
        ("##", "Header_2"),
        ("###", "Header_3"),
    ],
)

splits = [split for doc in docs for split in splitter.split_text(doc.page_content)]

for d in splits[:3]:
    print(f"- {d.page_content=}")
print("...")

6. 임베딩 모델 로드

from langchain_ollama import OllamaEmbeddings

embeddings = OllamaEmbeddings(
    model = "bge-m3:latest",
)

7. 검색

QdrantDB로 Dense search

from langchain_qdrant import QdrantVectorStore
from langchain_qdrant import RetrievalMode

vector_store = QdrantVectorStore.from_documents(
    documents=splits,
    embedding = embeddings,
    location = ":memory:",
    collection_name = "rag_collection",
    retrieval_mode = RetrievalMode.DENSE
)

retriever = vector_store.as_retriever(search_kwargs = {"k": 10})

Rerank

from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import CrossEncoderReranker
from langchain_community.cross_encoders import HuggingFaceCrossEncoder

model = HuggingFaceCrossEncoder(model_name = "BAAI/bge-reranker-base")
compressor = CrossEncoderReranker(model=model, top_n=5)
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor, base_retriever=retriever
)

8. 최종 flow

이 부분은 강의 내용에 없어서 혼자 짜봤다.
prompt engineering이나 hyper parameter tuningdmf 따로 안해서 그런지 답변은 아주 구리다

from langchain_core.messages import HumanMessage
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate

from langgraph.graph import START, StateGraph, END

# 1. 질문 분류 함수 - 중요: 여기서는 상태를 업데이트하는 노드 함수
def classify_node(state: RAGState):
    """질문을 분류하여 처리 모드를 결정합니다."""
    query = state["query"]
    # 모드를 상태에 저장
    if "Docling" in query:
        print("===검색 시작===")
        return {"mode": "retrieve"}
    else:
        print("===생성 시작===")
        return {"mode": "generate"}
    
# 2. 검색 노드 - 검색 후 상태 업데이트
def retrieve_node(state: RAGState):
    """문서를 검색하고 상태를 업데이트합니다."""
    query = state["query"]
    documents = compression_retriever.invoke(query)
    print(f"===검색 완료: {len(documents)}개의 문서 검색됨===")
    return {"documents": documents}

# 3. 추론 노드 - 상태 업데이트 
def think_node(state: RAGState):
    """추론 모델을 사용하여 생각을 생성합니다."""   
    query = state["query"]
    documents = state["documents"]
    thinking = reasoning_llm.invoke(
        f"<think> {query} </think> {documents}"
    )
    # AIMessage에서 content만 추출
    thinking_content = thinking.content
    print(f"===추론 완료: {thinking_content}===")
    return {"thinking": thinking_content}

# 4. 답변 생성 노드 - 상태 업데이트
def answer_node(state: RAGState):
    """답변 모델을 사용하여 최종 답변을 생성합니다."""
    thinking = state["thinking"]
    # thinking의 content 부분만 추출하여 프롬프트 구성
    prompt = f"{thinking}"
    # 메시지 생성 및 모델 호출
    message = HumanMessage(content=prompt)
    response = answer_llm.invoke([message])
    answer = response.content
    print(f"===답변 생성 완료: {answer}===")
    return {"answer": answer}  # 반드시 answer를 딕셔너리 형태로 반환

# 5. 워크플로우 생성
workflow = StateGraph(RAGState)

# 시작 노드에서 분류 노드로
workflow.add_node("classify", classify_node)
# 분류 노드 이후 조건부 라우팅
workflow.add_conditional_edges(
    "classify",
    lambda x: x["mode"],
    {
        "retrieve": "retrieve",
        "generate": "answer"
    }
)

# 검색 노드에서 추론 노드로
workflow.add_node("retrieve", retrieve_node)
workflow.add_edge("retrieve", "think")

# 추론 노드에서 답변 노드로
workflow.add_node("think", think_node)
workflow.add_edge("think", "answer")

# 답변 노드 추가 및 종료
workflow.add_node("answer", answer_node)
workflow.add_edge("answer", END)

# 시작점 설정
workflow.set_entry_point("classify")

# 그래프 컴파일
graph = workflow.compile()

# RAG 시스템 실행 함수
def run_rag_system(query: str) -> str:
    """RAG 시스템을 실행하여 질문에 대한 답변을 생성합니다."""
    result = graph.invoke({
        "query": query,
        "documents": [],
        "thinking": "",
        "answer": "",
        "messages": [],
        "mode": ""
    })
    return result["answer"]

# RAG 시스템 실행 예시
result = run_rag_system("Docling에 대해 설명해줘.")
print(f"최종 답변: {result}")

print된 과정...

===검색 시작===
===검색 완료: 5개의 문서 검색됨===
===추론 완료: <think>
Alright, I need to explain Docling based on the provided information. Let me start by understanding what each document says.

Docling is an open-source PDF converter. The first document explains its extensibility with a model pipeline for customization. It mentions implementing a linear processing pipeline that works page-by-page. The introduction highlights its challenges in converting PDFs due to their variability and the gap between open-source tools and commercial ones.

Looking at the other documents, they talk about versioning (1, 2, 3, 4) but maybe it's just part of how the information is structured. The key features are:
- Converts PDFs to JSON or Markdown quickly.
- Understands page layout, reading order, figures, and table structures.
- Extracts metadata like title, authors, etc.
- Supports OCR for scanned PDFs.
- Works in batch or interactive mode.
- Can use GPUs for acceleration.

The abstract summarizes it as a self-contained MIT tool with state-of-the-art AI models. The future work includes adding more models like figures and equations, improving GPU support, and testing.

Putting this together, I should explain that Docling is designed to be simple, efficient, extensible, and covers various features for PDF conversion.
===
===답변 생성 완료: Certainly! Here’s a concise explanation of Docling based on the provided information:

**Docling** is an open-source PDF converter designed to address the complexities and variability inherent in PDF documents through advanced AI capabilities. Here are its key features and strengths:

1. **Efficiency and Speed**: Docling quickly converts PDFs into structured formats like JSON or Markdown, making it highly efficient for both batch and interactive use cases.

2. **Advanced Parsing**: It excels at understanding complex PDF structures, including:
   - **Page Layout**: Accurate interpretation of text placement and layout.
   - **Reading Order**: Proper sequencing of content for readability.
   - **Figures and Tables**: Extraction and preservation of visual elements and tabular data.
   - **Metadata**: Extraction of essential metadata such as titles, authors, and other document properties.

3. **Versatility**: Docling supports OCR (Optical Character Recognition) for scanned PDFs, ensuring that even non-text content can be effectively processed.

4. **Flexibility**:
   - **Extensibility**: Built with a modular pipeline architecture, allowing for customization and integration with various AI models.
   - **Mode Support**: Operates seamlessly in both batch processing and interactive modes, catering to diverse user needs.
   - **Acceleration**: Leverages GPU support for faster processing, enhancing performance significantly.

5. **Open-Source and Community-Driven**: As an open-source tool, Docling benefits from community contributions and continuous improvement, aiming to bridge the gap between open-source and commercial PDF conversion tools.

6. **Future Enhancements**: Ongoing development focuses on expanding model capabilities (e.g., better figure and equation extraction), enhancing GPU utilization, and thorough testing to ensure robustness and reliability.

In summary, Docling stands out as a powerful, flexible, and community-supported tool designed to simplify and enhance PDF document conversion with state-of-the-art AI features.===
최종 답변: Certainly! Here’s a concise explanation of Docling based on the provided information:

**Docling** is an open-source PDF converter designed to address the complexities and variability inherent in PDF documents through advanced AI capabilities. Here are its key features and strengths:

1. **Efficiency and Speed**: Docling quickly converts PDFs into structured formats like JSON or Markdown, making it highly efficient for both batch and interactive use cases.

2. **Advanced Parsing**: It excels at understanding complex PDF structures, including:
   - **Page Layout**: Accurate interpretation of text placement and layout.
   - **Reading Order**: Proper sequencing of content for readability.
   - **Figures and Tables**: Extraction and preservation of visual elements and tabular data.
   - **Metadata**: Extraction of essential metadata such as titles, authors, and other document properties.

3. **Versatility**: Docling supports OCR (Optical Character Recognition) for scanned PDFs, ensuring that even non-text content can be effectively processed.

4. **Flexibility**:
   - **Extensibility**: Built with a modular pipeline architecture, allowing for customization and integration with various AI models.
   - **Mode Support**: Operates seamlessly in both batch processing and interactive modes, catering to diverse user needs.
   - **Acceleration**: Leverages GPU support for faster processing, enhancing performance significantly.

5. **Open-Source and Community-Driven**: As an open-source tool, Docling benefits from community contributions and continuous improvement, aiming to bridge the gap between open-source and commercial PDF conversion tools.

6. **Future Enhancements**: Ongoing development focuses on expanding model capabilities (e.g., better figure and equation extraction), enhancing GPU utilization, and thorough testing to ensure robustness and reliability.

In summary, Docling stands out as a powerful, flexible, and community-supported tool designed to simplify and enhance PDF document conversion with state-of-the-art AI features.

추론과 답변 과정을 분리하고, 모델도 각각 다르게 선정한게 인상깊었다.

이전 프로젝트에서 sLM 활용했을 때, 추론과 답변 과정을 분리했었는데 어디서 레퍼런스를 본 게 아니고 그냥 이런 아이디어가 떠올라서 해봤는데 원래 이렇게도 하나보다. 🤔

728x90

저작자표시 (새창열림)