Semantic Search & RAG WordPress Project – Current Version Dec 2025

Project Goal and Core Function

The primary goal of this project is to elevate the standard functionality of WordPress search from simple keyword matching to advanced semantic search and Retrieval-Augmented Generation (RAG). The semantic search capability ensures that the system finds content based on the meaning and intent of the user’s query, rather than just matching isolated keywords. Complementing this, the RAG feature implements a "Summarise Article" button. When clicked, it extracts the most relevant chunk of the source post (identified during the search) and uses the Gemini LLM to generate a concise summary of only that relevant, context-specific text chunk. For monitoring and debugging, the entire system utilizes Socket.IO to provide a real-time console showing every step of the backend process, including embedding generation, the search query, and the summarization calls.

System Architecture and Key Technologies

This project operates on a hybrid architecture spanning three distinct environments. The WordPress environment acts as the Client and Frontend, managing the UI, handling user input (voice/text), and managing the AJAX and WebSocket connections. The Flask server serves as the Backend API and Logic, written in Python. It hosts the core search logic, loads the vector index into memory, and handles all communication with the LLMs. Finally, Ollama is used as the local Embedding Service, which is utilized by Flask to generate the high-quality vector embeddings required for the accurate semantic search.

Key Scripts, Directories, and Data

The system relies on four primary scripts and two essential data directories. The “my voice search.php” script, located in wp-content/plugins/my voice search/, acts as the WordPress Plugin Core. It defines the plugin metadata, enqueues the necessary CSS and JavaScript files, and registers the shortcodes (like [my voice search]) that inject the front-end UI and the real-time console onto a WordPress page. The speech-recognition.js file, found in the plugin’s js/ directory, contains the Client-Side Logic. It handles the Web Speech API input, sends queries to the Flask API at the /api/search endpoint, and manages the RAG-enabled summarization request sent to /api/summarize/<id>/<chunk index>. It is also responsible for maintaining the Socket.IO connection for real-time output.

On the server side, the generate_embeddings.py script, located in the project root, is a one-time execution script for data preparation. It connects to the MySQL database, extracts and chunks the post content, uses the all-minilm model (via Ollama) to generate a vector for each chunk, and saves the resulting index to the wordpress embeddings.json file. The semantic_search_api.py script, also in the project root, is the core Backend API that runs continuously. It loads the vector index into memory, performs the semantic search, hosts the Socket.IO server, and contains the RAG logic for retrieving the full text chunk and calling the Gemini LLM for generation. The full text content that the summarizer retrieves from is stored across many individual .txt files within the full_posts_text_chunks/ directory.

Semantic Search and RAG Process Flow

The system features two main operational flows. The Semantic Search Flow begins when the user speaks or types a query into the WordPress UI. The speech-recognition.js sends this query to the Flask /api/search endpoint. The semantic_search_api.py then uses Ollama to convert the text query into a vector (embedding), performs a rapid vector similarity search against all stored vectors in the corpus_embeddings, and sends the top 5 most relevant results (including the critical relevant_chunk_index) back to the browser for display.

The RAG Summarization Flow is triggered when the user clicks the "Summarise Article" button. The client-side script reads the post-id and the chunk-index from the button and sends a GET request to the Flask /api/summarize endpoint. The Flask script uses these parameters to precisely locate and retrieve the full text content from the file: full_posts_text_chunks/<post_id>_chunk_<index>.txt. This retrieved text is then sent as the grounding content within the prompt to the Gemini LLM. The LLM generates the summary, which Flask returns to the client, allowing the JavaScript to seamlessly replace the original search snippet with the new, concise summary.

The Python Virtual Environment (venv)

The Python Virtual Environment (venv) is essential for maintaining a stable development environment. It creates an isolated directory with its own Python interpreter and all necessary libraries (Flask, google-generativeai, etc.). This isolation ensures that the specific library versions required for this project do not conflict with those needed by other Python projects or the host machine's operating system, guaranteeing reproducibility and stability.