<?xml version="1.0" encoding="UTF-8"?><rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Jordan Choo</title><description>Jordan Choo builds AI and SEO tools, writes about growth systems, and helps brands improve analytics, automation, and organic performance.</description><link>https://jordanchoo.com/</link><language>en-us</language><atom:link href="https://jordanchoo.com/rss.xml" rel="self" type="application/rss+xml"/><item><title>Building a GraphRAG Agent From The Ground Up</title><link>https://jordanchoo.com/blog/building-a-graphrag-agent-from-the-ground-up/</link><guid isPermaLink="true">https://jordanchoo.com/blog/building-a-graphrag-agent-from-the-ground-up/</guid><description>As I’ve been diving more and more into Claude Code and AI the more I realize how much I have to learn around the systems and infrastructure that power them.</description><pubDate>Wed, 04 Feb 2026 17:38:08 GMT</pubDate><content:encoded>&lt;p&gt;&lt;img src=&quot;https://jordanchoo.com/images/seo/graph-visualization-1024x464.png&quot; alt=&quot;&quot;&gt;&lt;/p&gt;
&lt;p&gt;As I’ve been diving more and more into Claude Code and AI the more I realize how much I have to learn around the systems and infrastructure that power them.&lt;/p&gt;
&lt;p&gt;One piece of infrastructure that has always piqued my interest are &lt;a href=&quot;https://neo4j.com/docs/getting-started/graph-database/&quot;&gt;graph databases&lt;/a&gt; and using them for RAG applications (AKA &lt;a href=&quot;https://neo4j.com/blog/genai/what-is-graphrag/&quot;&gt;GraphRAG&lt;/a&gt;). Reason being is that not only do they allow for more advanced querying but, as an SEO, it’s a concept that has had quite the buzz since 2012 thanks to &lt;a href=&quot;https://blog.google/products-and-platforms/products/search/introducing-knowledge-graph-things-not/&quot;&gt;Google&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&quot;claude-code-graph-ccgraph&quot;&gt;Claude Code Graph (CCGraph)&lt;/h2&gt;
&lt;p&gt;So what did I build?&lt;/p&gt;
&lt;p&gt;Well, it all started thanks to a conversation with &lt;a href=&quot;https://www.linkedin.com/in/noahlearner&quot;&gt;Noah Learner&lt;/a&gt; on discovering new and useful Claude Code repos. Something that can be both overwhelming (thanks to the +5,000 repos on GitHub that use the topic Claude Code) and is highly personalized based on your workflow.&lt;/p&gt;
&lt;p&gt;After staring helplessly into the void for about 5 minutes on how to do this, I started poking around &lt;a href=&quot;https://api.github.com/search/repositories?q=topic:claude-code&amp;#x26;sort=stars&amp;#x26;order=desc&amp;#x26;per_page=100&amp;#x26;page=1&quot;&gt;GitHub’s public API for repos with Claude Code as the topic&lt;/a&gt; and realized this was the perfect pet project to learn to build an agentic GraphRAG app end to end.&lt;/p&gt;
&lt;p&gt;And 6 days of intermittent work later &lt;a href=&quot;https://ccgraph.jordanchoo.com/&quot;&gt;Claude Code Graph (CCGraph)&lt;/a&gt; was born!&lt;/p&gt;
&lt;h2 id=&quot;the-journey&quot;&gt;The Journey&lt;/h2&gt;
&lt;p&gt;Now that we know what, let’s talk about how.&lt;/p&gt;
&lt;p&gt;Prior to this my only experience working with Claude Code were 2 tiny test projects (a personal finance landing page quiz and a proof of concept transcription pipeline). Thankfully because of those two projects I already had a bit of a feel for how to approach development and previously created &lt;a href=&quot;https://github.com/JordanChoo/claude-code-starter/releases/tag/v1.0.0&quot;&gt;v1.0 of Claude Code Starter&lt;/a&gt; which I used as the starting point for &lt;a href=&quot;https://ccgraph.jordanchoo.com/&quot;&gt;CCGraph&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id=&quot;milestone-0-understanding-graph-databases-and-graphrag&quot;&gt;Milestone 0: Understanding Graph Databases and GraphRAG&lt;/h3&gt;
&lt;p&gt;Before even starting development I did a deep dive into really understanding what graph databases and GraphRAG is and how they work.&lt;/p&gt;
&lt;p&gt;This consisted of having long in-depth multi-day conversations with both Gemini and Claude, watching countless YouTube videos, and going through articles and documentation.&lt;/p&gt;
&lt;p&gt;Why do this rather than jumping in blindly?&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;To prevent brain rot by guiding strategy while Claude lead the execution&lt;/li&gt;
&lt;li&gt;I actually want to understand the underlying technology (not enough people do this!)&lt;/li&gt;
&lt;li&gt;Claude does some really stupid shit and will gaslight you into allowing it to do dumb things&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you’re looking to learn more about graph databases and graphRAG, these resources came in handy:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Neo4j’s &lt;a href=&quot;https://graphacademy.neo4j.com/&quot;&gt;graph academy&lt;/a&gt;, &lt;a href=&quot;https://neo4j.com/blog/&quot;&gt;blog&lt;/a&gt;, &lt;a href=&quot;https://neo4j.com/resources/&quot;&gt;resources&lt;/a&gt;, and &lt;a href=&quot;https://www.youtube.com/neo4j&quot;&gt;YouTube channel&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://www.youtube.com/@aiDotEngineer&quot;&gt;AI Engineer’s YouTube channel&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://www.youtube.com/@ColeMedin&quot;&gt;Cole Medin’s YouTube channel&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id=&quot;deciding-on-graph-structure&quot;&gt;Deciding on Graph Structure&lt;/h4&gt;
&lt;p&gt;Based on the learnings from my deep dive and brainstorming with Claude on how users will interact with the app, I decided on the following structure:&lt;/p&gt;
&lt;p&gt;4 types of nodes would be used which consisted of:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Author - The person who owns the repository&lt;/li&gt;
&lt;li&gt;Repository - The repository itself&lt;/li&gt;
&lt;li&gt;Topic - Any tagged topic that a repository may have&lt;/li&gt;
&lt;li&gt;Section - A chunked readable and vector embedded version of the README.md and Claude.md&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The edges (the way the nodes are connected to each other) would be:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;References - Relationship between a repo and another repo when it is mentioned by name&lt;/li&gt;
&lt;li&gt;Owned By - Relationship between the author and repo&lt;/li&gt;
&lt;li&gt;Has Topic - Relationship between a topic and repo&lt;/li&gt;
&lt;li&gt;Has Section - Relationship between the repo and the README.md/Claude.md&lt;/li&gt;
&lt;li&gt;Similar To - Relationship between two README.md/Claude.md chunks&lt;/li&gt;
&lt;li&gt;Related Topic - Relationship between two topics&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;img src=&quot;https://jordanchoo.com/images/blog/graph-database.png&quot; alt=&quot;&quot;&gt;&lt;/p&gt;
&lt;p&gt;With both the edges and nodes now mapped out, the LLM can now easily traverse through the graph allowing it to find repos and topics that are connected or similar with deep context for the user.&lt;/p&gt;
&lt;h3 id=&quot;milestone-1-initial-app--data-ingestions&quot;&gt;Milestone 1: Initial App &amp;#x26; Data Ingestions&lt;/h3&gt;
&lt;p&gt;The first step was collaborating with Claude on the initial PRD.&lt;/p&gt;
&lt;p&gt;I personally like taking a collaborative approach with the PRD development by:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Explaining in detail what I want to build, the technology that should be used for it, and what I want the users to learn/get value from&lt;/li&gt;
&lt;li&gt;Go back and forth with Claude asking and answering questions about the backend logic, user experience, and any other items&lt;/li&gt;
&lt;li&gt;Having Claude write an initial PRD and then creating a 2nd session within the same folder and asking it to &lt;em&gt;“unapologetically and ruthlessly to find edge cases, potential bugs, and items that lack clarity”&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;Ruthless Claude typically finds 15-20 items that need to be improved which I’ll then go item by item with it to collaborate on finding a solution to&lt;/li&gt;
&lt;li&gt;&lt;em&gt;You can take this a step further by having multiple session of Ruthless Claude rereview the updated PRD or even have a separate LLM (ChatGPT or Gemini) review it&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;Once I’m happy with the solutions Ruthless Claude will then update the PRD and I’ll have the original Claude start the build out process turning the refined PRD into an epic and individual issues&lt;/li&gt;
&lt;li&gt;After everything’s been pushed to GitHub I then let Claude Code start development in parallel and launching new agents whenever a new issue becomes unblocked&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;img src=&quot;https://jordanchoo.com/images/blog/prd-example.png&quot; alt=&quot;&quot;&gt;&lt;/p&gt;
&lt;p&gt;The initial PRD for &lt;a href=&quot;https://ccgraph.jordanchoo.com/&quot;&gt;CCGraph&lt;/a&gt; was focused on creating the foundational elements of the app which included:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Backend server settings and UI&lt;/li&gt;
&lt;li&gt;Data ingestion pipeline&lt;/li&gt;
&lt;li&gt;Loading the data into Neo4J&lt;/li&gt;
&lt;li&gt;Outlining the chat proxy&lt;/li&gt;
&lt;li&gt;Developing error handling, security, and testing&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The initial infrastructure used:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://firebase.google.com/&quot;&gt;Firebase&lt;/a&gt; for hosting&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://neo4j.com/&quot;&gt;Neo4J&lt;/a&gt; as the graph database&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://vuejs.org/&quot;&gt;Vue&lt;/a&gt; for the framework&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://tailwindcss.com/&quot;&gt;Tailwind CSS&lt;/a&gt; to make it look pretty&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Now that the easy part was done, it was time to start refining the chat responses.&lt;/p&gt;
&lt;h3 id=&quot;milestone-2-graphrag---prompt-workflows&quot;&gt;Milestone 2: GraphRAG - Prompt Workflows&lt;/h3&gt;
&lt;p&gt;The first iteration of &lt;a href=&quot;https://ccgraph.jordanchoo.com/&quot;&gt;CCGraph&lt;/a&gt; used a prompt workflow which looked like:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;User sends a query&lt;/li&gt;
&lt;li&gt;Neo4j is queried based on user’s query&lt;/li&gt;
&lt;li&gt;Response is then put into a system prompt&lt;/li&gt;
&lt;li&gt;LLM responds back to the user’s query&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;img src=&quot;https://jordanchoo.com/images/blog/second-response.png&quot; alt=&quot;&quot;&gt;&lt;/p&gt;
&lt;p&gt;Though it wasn’t perfect it kinda got the job done… not really…&lt;/p&gt;
&lt;p&gt;What I found is that the responses lacked relevancy and context to what the user was actually trying to learn based on their query. The response was just spewing out repos willy nilly.&lt;/p&gt;
&lt;p&gt;Three big things had to be changed:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The user’s query had to be categorized to the app could figure out how to use Neo4j&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Initially Claude did this by regex but, we quickly migrated to using an LLM (re: Claude doing dumb things)&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;Structure for the Neo4j requests had to be implemented (e.g. sorting, filtering, aggregating)&lt;/li&gt;
&lt;li&gt;A relevancy score was developed for repos based on what the user was asking blending section level and repo level cosine similarity&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;After a lot of back and forth and testing prompts the workflow then moved to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;User sends a query&lt;/li&gt;
&lt;li&gt;LLM categorizes the query&lt;/li&gt;
&lt;li&gt;Based on the category Neo4J is queried&lt;/li&gt;
&lt;li&gt;Neo4j response is put into a system prompt&lt;/li&gt;
&lt;li&gt;LLM responds back to the user’s query&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;img src=&quot;https://jordanchoo.com/images/blog/first-response.png&quot; alt=&quot;&quot;&gt;&lt;/p&gt;
&lt;p&gt;Now we’re starting to get somewhere!&lt;/p&gt;
&lt;p&gt;Responses started getting a heck of a lot closer to what you’d expect as a user but, I really wanted to understand what was happening behind the scenes so that I could continue testing and tweaking prompts.&lt;/p&gt;
&lt;p&gt;And so came along…&lt;/p&gt;
&lt;h3 id=&quot;milestone-3-tracing--observability&quot;&gt;Milestone 3: Tracing &amp;#x26; Observability&lt;/h3&gt;
&lt;p&gt;Not only was adding observability and tracing important from an ops and refinement perspective but, knowing that down the road I plan on making production level AI apps this was something that I knew would be key to understand.&lt;/p&gt;
&lt;p&gt;Eventually, I came across &lt;a href=&quot;https://langfuse.com/docs/observability/overview&quot;&gt;Langfuse&lt;/a&gt; and not only has it been heavily tested by others, has other key features (prompt management, A/B testing, evals, and a lot more) but, it’s open source and can easily be deployed on &lt;a href=&quot;https://elest.io/open-source/langfuse&quot;&gt;Elestio&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://jordanchoo.com/images/blog/prompt-session-scaled.png&quot; alt=&quot;&quot;&gt;&lt;/p&gt;
&lt;p&gt;Implementation wise it was a super easy one prompt ask with Claude Code simply asking:&lt;/p&gt;
&lt;p&gt;*“I want to implement Langfuse’s observability and tracing (&lt;a href=&quot;https://langfuse.com/docs/observability/overview&quot;&gt;https://langfuse.com/docs/observability/overview&lt;/a&gt;) into the app for all prompts using the JS SDK (&lt;a href=&quot;https://github.com/langfuse/langfuse-js)*%E2%80%9D&quot;&gt;https://github.com/langfuse/langfuse-js)*”&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;And off it went quickly implementing Langfuse in a couple of short minutes only asking me for the API key along the way.&lt;/p&gt;
&lt;p&gt;As I continued to test the prompts more and more the responses still felt a bit off and seemed to lack deep context and understanding of what was being asked.&lt;/p&gt;
&lt;p&gt;Digging more and more into the issue and going through resources I then naively thought “Let’s move this over to an agent, I’m sure it’ll be super easy…”&lt;/p&gt;
&lt;p&gt;…It was not&lt;/p&gt;
&lt;h4 id=&quot;prompt-workflows-vs-agents&quot;&gt;Prompt Workflows vs Agents&lt;/h4&gt;
&lt;p&gt;In case you aren’t aware there’s a major difference between prompt workflows and agents.&lt;/p&gt;
&lt;p&gt;On one hand, prompt workflows are like assembly lines and quite linear where it takes an input (user query) and then goes through a pre-determined sequence of steps until there is a response back to the user.&lt;/p&gt;
&lt;p&gt;Agents on the other hand take a non-linear problem solving approach where you feed it the user’s query and provide it with tools (e.g. pre-built database queries or 3rd party API end points) to respond back to the user in a dynamic way.&lt;/p&gt;
&lt;p&gt;If you want to learn more about the pros and cons of both, &lt;a href=&quot;https://www.confluent.io/learn/prompts-vs-workflows-vs-agents/&quot;&gt;Confluent has a great quick guide on them&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id=&quot;milestone-4-agent-overhaul&quot;&gt;Milestone 4: Agent Overhaul&lt;/h3&gt;
&lt;p&gt;Though, ultimately, it was the right decision (not just from a response quality standpoint but, for learning) it required an entire architectural overhaul and a long time refining the prompt and tools used by the agent.&lt;/p&gt;
&lt;p&gt;Eventually, I landed on using &lt;a href=&quot;https://github.com/langchain-ai/langgraph&quot;&gt;LangGraph&lt;/a&gt; for the agent’s framework which (not so) coincidentally has a &lt;a href=&quot;https://langfuse.com/integrations/frameworks/langchain&quot;&gt;very tidy integration with LangFuse&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;After a lot of going back and forth with Claude Code and some hand holding the migration over to an agent was complete but…&lt;/p&gt;
&lt;p&gt;…The responses from it were hot flaming trash&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://jordanchoo.com/images/blog/bad-agent-output.png&quot; alt=&quot;&quot;&gt;&lt;/p&gt;
&lt;p&gt;Digging into the traces from Langfuse I realized two major things:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;I wasn’t providing enough tools for the agent to properly traverse the graph&lt;/li&gt;
&lt;li&gt;The agent prompt was garbage&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id=&quot;building-out-a-toolset&quot;&gt;Building Out a Toolset&lt;/h4&gt;
&lt;p&gt;Initially the agent had two tools available to it;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Vector search&lt;/strong&gt; which used the previously implemented blended score system to help answer conceptual and broader questions.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Structured query&lt;/strong&gt; that allowed the agent to search the graph for rankings, stats, filtering, and relationship traversal.&lt;/p&gt;
&lt;p&gt;After going back and forth with Claude Code, “Rutheless Claude”, and Gemini we came up with a library of queries that a user may ask. Based on that we identified a whole slew of gaps within the current tools.&lt;/p&gt;
&lt;p&gt;The structured query tool was expanded to allow for 4 additional intents for pulling from the graph database which included references, similar repos, and related topics.&lt;/p&gt;
&lt;p&gt;README.md and Claude.md tools were added to allow users to do deep dives into how to use a specific repo and the workflows around it.&lt;/p&gt;
&lt;h4 id=&quot;improving-the-agent-prompt&quot;&gt;Improving The Agent Prompt&lt;/h4&gt;
&lt;p&gt;&lt;img src=&quot;https://jordanchoo.com/images/blog/agent-prompt.png&quot; alt=&quot;&quot;&gt;&lt;/p&gt;
&lt;p&gt;As embarrassingly shown above, the initial prompt had a LOT of issues…&lt;/p&gt;
&lt;p&gt;Diving head first into the problem again with 3 of my favourite friends (Claude Code, “Rutheless Claude”, and Gemini) we slowly refined and tested the prompt to get to a place where the responses were much more useful.&lt;/p&gt;
&lt;p&gt;A few key changes that were made included:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Providing a strategy (chain of thought) on how to approach a user’s query&lt;/li&gt;
&lt;li&gt;Describing each tool in greater detail along with providing examples of questions that were labeled&lt;/li&gt;
&lt;li&gt;Details on how to choose which tool to use and when&lt;/li&gt;
&lt;li&gt;Guidelines on how to provide the responded to the user in a helpful way&lt;/li&gt;
&lt;li&gt;Forcing no hallucinations&lt;/li&gt;
&lt;li&gt;Important rules that should always be followed&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;After rolling out both the new and improved tools along with a shiny new prompt the agent was finally starting to be useful for users (at least for me)&lt;/p&gt;
&lt;h4 id=&quot;adding-response-feedback&quot;&gt;Adding Response Feedback&lt;/h4&gt;
&lt;p&gt;&lt;img src=&quot;https://jordanchoo.com/images/blog/langfuse-feedback.png&quot; alt=&quot;&quot;&gt;&lt;/p&gt;
&lt;p&gt;Though I’m happy with how the agent is performing it’s still a long way from perfect.&lt;/p&gt;
&lt;p&gt;To help with this and to get more experience with production ready AI apps, scoring and feedback was added. This allows users to either give the agent’s response a thumbs up or down along with a reason why the response was bad.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://jordanchoo.com/images/blog/langfuse-scores.png&quot; alt=&quot;&quot;&gt;&lt;/p&gt;
&lt;p&gt;The results from the feedback are collected in LangFuse which can be used to both debug the agent and be used as evals for future tests.&lt;/p&gt;
&lt;h3 id=&quot;milestone-5-ux-overhaul--graph-view&quot;&gt;Milestone 5: UX Overhaul &amp;#x26; Graph View&lt;/h3&gt;
&lt;p&gt;The next step was polishing the UX of the app and making it look half decent.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://jordanchoo.com/images/blog/design-system.png&quot; alt=&quot;&quot;&gt;&lt;/p&gt;
&lt;p&gt;Severely lacking any sense of design and creativity, I fed &lt;a href=&quot;https://www.pencil.dev/&quot;&gt;Pencil&lt;/a&gt; my personal website (both URL and screenshots) as a baseline and had it build out a design system that was practically copy and pasted into &lt;a href=&quot;https://ccgraph.jordanchoo.com/&quot;&gt;CCGraph&lt;/a&gt;.&lt;/p&gt;
&lt;h4 id=&quot;building-the-graph-visualization&quot;&gt;Building the Graph Visualization&lt;/h4&gt;
&lt;p&gt;Now that I had a half decent look the next step was to build out a graph visualization to allow users to visualize repos, authors, and topics and explore them further.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://jordanchoo.com/images/seo/graph-visualization-scaled.png&quot; alt=&quot;&quot;&gt;&lt;/p&gt;
&lt;p&gt;Using &lt;a href=&quot;https://d3js.org/&quot;&gt;D3 as the library&lt;/a&gt;, the first response from the LLM was integrated into the main chat interface as a side bar.&lt;/p&gt;
&lt;p&gt;The first initial graphs contained a lot of orphan nodes regardless of type (repo, author, and tag) which required going back and forth with Claude on identifying the reason why and creating a set of conditions on how each node should be displayed and interacted with.&lt;/p&gt;
&lt;p&gt;Ultimately, I ended up with a graph visualization where users can click into each node and find repos that fit within a certain topic or author, pre-populate a query about a specific repo, or even visit a repo’s GitHub page.&lt;/p&gt;
&lt;h3 id=&quot;milestone-6-deployment&quot;&gt;Milestone 6: Deployment&lt;/h3&gt;
&lt;p&gt;With the development done it was now time to deploy on Firebase and it could not have been easier.&lt;/p&gt;
&lt;p&gt;Thankfully Claude Code makes it super simple to do it and takes you by the hand walking you step by step on what it’s doing and what it needs from you along the way.&lt;/p&gt;
&lt;h2 id=&quot;learnings-and-whats-next&quot;&gt;Learnings and What’s Next&lt;/h2&gt;
&lt;p&gt;Though it’s not the sexiest and most useful of apps, unlike &lt;a href=&quot;https://theseocommunity.com/resources/blog/building-ai-search-what-we-learned-along-the-way&quot;&gt;other ones that I’ve seen rolled out&lt;/a&gt;, it was the perfect excuse to learn new technology and get more comfortable with Claude Code.&lt;/p&gt;
&lt;p&gt;With that being said, a few key learnings that I had from my journey include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;If you have garbage responses, it’s cause you have garbage prompts&lt;/li&gt;
&lt;li&gt;Create a ./tmp directory that doesn’t get included in git (include it in .gitignore) to have Claude save mini-PRDs, prompts, and other items that you can then edit or feed into other LLMs/sessions for feedback on&lt;/li&gt;
&lt;li&gt;Workflows and agents are not the same and the way that you prompt them and feed them context need to be approached differently&lt;/li&gt;
&lt;li&gt;If you want to integrate a tool or library include links directly to the documentation and repos to make Claude’s life easier&lt;/li&gt;
&lt;li&gt;Take time to research, learn, and plan what you want to do rather than blindly trusting in Claude&lt;/li&gt;
&lt;li&gt;Observability is KEY when building AI powered applications&lt;/li&gt;
&lt;li&gt;Graph databases and GraphRAG comes off as more intimidating than it actually is&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;As far as what’s next, &lt;a href=&quot;https://ccgraph.jordanchoo.com/&quot;&gt;CCGraph&lt;/a&gt; will stay live and occasional updates will be rolled out based on insights from Langfuse and any crazy scientist ideas I may get.&lt;/p&gt;
&lt;p&gt;On my side it’s a few things:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Using the learnings on graph databases and graphRAG to build out production grade applications that drive enterprise value&lt;/li&gt;
&lt;li&gt;Diving deeper into Claude Code by exploring tools like &lt;a href=&quot;https://github.com/steveyegge/beads/&quot;&gt;Beads&lt;/a&gt;, &lt;a href=&quot;https://github.com/ruvnet/claude-flow&quot;&gt;Claude Flow&lt;/a&gt;, &lt;a href=&quot;https://github.com/Dicklesworthstone/agentic_coding_flywheel_setup&quot;&gt;Agentic Flywheel&lt;/a&gt;, and &lt;a href=&quot;https://github.com/steveyegge/gastown&quot;&gt;Gas Town&lt;/a&gt; to name a few&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;p&gt;Have thoughts? Leave a comment below&lt;/p&gt;
&lt;iframe src=&quot;https://www.linkedin.com/embed/feed/update/urn:li:share:7424872409771528192?collapsed=1&quot; height=&quot;600&quot; width=&quot;504&quot; frameborder=&quot;0&quot; allowfullscreen title=&quot;Embedded post&quot;&gt;&lt;/iframe&gt;</content:encoded></item><item><title>Finding Internal Link Opportunities at Scale with Vector Search</title><link>https://jordanchoo.com/blog/finding-internal-link-opportunities-at-scale-with-vector-search/</link><guid isPermaLink="true">https://jordanchoo.com/blog/finding-internal-link-opportunities-at-scale-with-vector-search/</guid><description>It seems that all the rage when it comes to AI and SEO has been around using it for some form of text generation. But, one of the most interesting features that I have yet to see really discussed is the usage of embeddings and vector search</description><pubDate>Fri, 08 Dec 2023 21:32:43 GMT</pubDate><content:encoded>&lt;p&gt;It seems that all the rage when it comes to AI and SEO has been around using it for some form of text generation. But, one of the most interesting features that I have yet to see really discussed is the usage of embeddings and vector search.&lt;/p&gt;
&lt;h2 id=&quot;what-are-emebddings&quot;&gt;What are Emebddings?&lt;/h2&gt;
&lt;p&gt;To understand what vector search is, you first need to know what embeddings are.&lt;/p&gt;
&lt;p&gt;Embeddings are essentially the translation of bodies of text (which I&apos;ll call documents) into numbers which allows algorithms to better understand the content of the document.&lt;/p&gt;
&lt;p&gt;These documents could be as short as an H1 to as long as an in-depth article.&lt;/p&gt;
&lt;h2 id=&quot;what-is-vector-search&quot;&gt;What is Vector Search?&lt;/h2&gt;
&lt;p&gt;Once you have these embeddings (i.e. number representations of your documents), a vector search is the comparison of the numbers against other numbers (i.e. comparing documents against each other) to find the similarity of them.&lt;/p&gt;
&lt;p&gt;The higher the similarity of these numbers the more likely they are related.&lt;/p&gt;
&lt;p&gt;*If you&apos;d like to dive deeper into the nitty gritty details of how vector search works, you can &lt;a href=&quot;https://openai.com/blog/introducing-text-and-code-embeddings&quot;&gt;read more about it on OpenAI&apos;s blog&lt;/a&gt;. *&lt;/p&gt;
&lt;h2 id=&quot;why-use-vector-search-for-internal-links&quot;&gt;Why Use Vector Search for Internal Links?&lt;/h2&gt;
&lt;p&gt;So why the heck should you use vector search instead of using something like ScreamingFrog + regex?&lt;/p&gt;
&lt;p&gt;Well... instead of trying to find cases of whether a keyword is on a page or not, you&apos;re now able to find opportunities based on semantic similarity. In plain English that means you can flag internal links based on topical similarity.&lt;/p&gt;
&lt;h2 id=&quot;how-to-find-internal-linking-opportunities&quot;&gt;How To Find Internal Linking Opportunities&lt;/h2&gt;
&lt;p&gt;The follow sections provides you with a step by step breakdown of this &lt;a href=&quot;https://github.com/JordanChoo/semantic-links&quot;&gt;GitHub repo and how the script works&lt;/a&gt;. Please note that the repo is simply a proof of concept and would need to be refined further to be production ready.&lt;/p&gt;
&lt;h3 id=&quot;1-exporting--prepping-your-documents&quot;&gt;1. Exporting &amp;#x26; Prepping Your Documents&lt;/h3&gt;
&lt;p&gt;In my case, Wordpress is typically the go to CMS for the clients that I work with and the platform thankfully allows you to &lt;a href=&quot;https://wordpress.com/support/export/&quot;&gt;export all of the pages or posts as an XML document&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Once exported I parse the XML file into an easy to use JSON object which I then parse and remove all of the internal links from the text:&lt;/p&gt;
&lt;pre class=&quot;shiki shiki-themes github-light github-dark&quot; style=&quot;--shiki-light:#24292e;--shiki-dark:#e1e4e8;--shiki-light-bg:#fff;--shiki-dark-bg:#24292e&quot; tabindex=&quot;0&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#6A737D;--shiki-dark:#6A737D&quot;&gt;// Get XML file&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#D73A49;--shiki-dark:#F97583&quot;&gt;let&lt;/span&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt; articlesXml &lt;/span&gt;&lt;span style=&quot;--shiki-light:#D73A49;--shiki-dark:#F97583&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;--shiki-light:#D73A49;--shiki-dark:#F97583&quot;&gt; await&lt;/span&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt; fs.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#6F42C1;--shiki-dark:#B392F0&quot;&gt;readFileSync&lt;/span&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#005CC5;--shiki-dark:#79B8FF&quot;&gt;ARTICLE_POSTS&lt;/span&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt;, &lt;/span&gt;&lt;span style=&quot;--shiki-light:#032F62;--shiki-dark:#9ECBFF&quot;&gt;&apos;utf8&apos;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt;);&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#6A737D;--shiki-dark:#6A737D&quot;&gt;// Parse XML file to JSON&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#D73A49;--shiki-dark:#F97583&quot;&gt;let&lt;/span&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt; articlesJson &lt;/span&gt;&lt;span style=&quot;--shiki-light:#D73A49;--shiki-dark:#F97583&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;--shiki-light:#D73A49;--shiki-dark:#F97583&quot;&gt; await&lt;/span&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt; convertXml.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#6F42C1;--shiki-dark:#B392F0&quot;&gt;xml2js&lt;/span&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt;(articlesXml, {compact: &lt;/span&gt;&lt;span style=&quot;--shiki-light:#005CC5;--shiki-dark:#79B8FF&quot;&gt;true&lt;/span&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt;, spaces: &lt;/span&gt;&lt;span style=&quot;--shiki-light:#005CC5;--shiki-dark:#79B8FF&quot;&gt;4&lt;/span&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt;, ignoreComment: &lt;/span&gt;&lt;span style=&quot;--shiki-light:#005CC5;--shiki-dark:#79B8FF&quot;&gt;true&lt;/span&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt;})&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#6A737D;--shiki-dark:#6A737D&quot;&gt;// Map Reduce (HTMl to Text + Parse Internal Links)&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#D73A49;--shiki-dark:#F97583&quot;&gt;let&lt;/span&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt; formattedArticles &lt;/span&gt;&lt;span style=&quot;--shiki-light:#D73A49;--shiki-dark:#F97583&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt; articlesJson.rss.channel.item.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#6F42C1;--shiki-dark:#B392F0&quot;&gt;map&lt;/span&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt;((&lt;/span&gt;&lt;span style=&quot;--shiki-light:#E36209;--shiki-dark:#FFAB70&quot;&gt;article&lt;/span&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt;) &lt;/span&gt;&lt;span style=&quot;--shiki-light:#D73A49;--shiki-dark:#F97583&quot;&gt;=&gt;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt; {&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#D73A49;--shiki-dark:#F97583&quot;&gt;    return&lt;/span&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt; { &lt;/span&gt;&lt;span style=&quot;--shiki-light:#D73A49;--shiki-dark:#F97583&quot;&gt;...&lt;/span&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt;article,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt;        articleText: &lt;/span&gt;&lt;span style=&quot;--shiki-light:#6F42C1;--shiki-dark:#B392F0&quot;&gt;convertHtml&lt;/span&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt;(article[&lt;/span&gt;&lt;span style=&quot;--shiki-light:#032F62;--shiki-dark:#9ECBFF&quot;&gt;&apos;content:encoded&apos;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt;]._cdata, {linkBrackets: &lt;/span&gt;&lt;span style=&quot;--shiki-light:#005CC5;--shiki-dark:#79B8FF&quot;&gt;false&lt;/span&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt;, ignoreHref: &lt;/span&gt;&lt;span style=&quot;--shiki-light:#005CC5;--shiki-dark:#79B8FF&quot;&gt;true&lt;/span&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt;})&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt;    };&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt;});&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;h3 id=&quot;2-translate-your-documents-into-embeddings&quot;&gt;2. Translate Your Documents into Embeddings&lt;/h3&gt;
&lt;p&gt;Once you have the documents ready, you then need to get the embeddings for them (AKA translating them into numbers).&lt;/p&gt;
&lt;p&gt;OpenAI provides you with an easy to use &lt;a href=&quot;https://platform.openai.com/docs/api-reference/embeddings/create&quot;&gt;Embeddings end point&lt;/a&gt; where you simply provide them with the document and they return the embedding version.&lt;/p&gt;
&lt;p&gt;You can see how to do that here:&lt;/p&gt;
&lt;pre class=&quot;shiki shiki-themes github-light github-dark&quot; style=&quot;--shiki-light:#24292e;--shiki-dark:#e1e4e8;--shiki-light-bg:#fff;--shiki-dark-bg:#24292e&quot; tabindex=&quot;0&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#6A737D;--shiki-dark:#6A737D&quot;&gt;// OpenAI Vectorize + Push to Pinecone&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#D73A49;--shiki-dark:#F97583&quot;&gt;for&lt;/span&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt; (&lt;/span&gt;&lt;span style=&quot;--shiki-light:#D73A49;--shiki-dark:#F97583&quot;&gt;let&lt;/span&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt; article &lt;/span&gt;&lt;span style=&quot;--shiki-light:#D73A49;--shiki-dark:#F97583&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;--shiki-light:#005CC5;--shiki-dark:#79B8FF&quot;&gt; 0&lt;/span&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt;; article &lt;/span&gt;&lt;span style=&quot;--shiki-light:#D73A49;--shiki-dark:#F97583&quot;&gt;&amp;#x3C;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt; formattedArticles.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#005CC5;--shiki-dark:#79B8FF&quot;&gt;length&lt;/span&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt;; article&lt;/span&gt;&lt;span style=&quot;--shiki-light:#D73A49;--shiki-dark:#F97583&quot;&gt;++&lt;/span&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt;) {&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#6A737D;--shiki-dark:#6A737D&quot;&gt;    // Create embedding via OpenAI&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#D73A49;--shiki-dark:#F97583&quot;&gt;    let&lt;/span&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt; embedding &lt;/span&gt;&lt;span style=&quot;--shiki-light:#D73A49;--shiki-dark:#F97583&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;--shiki-light:#D73A49;--shiki-dark:#F97583&quot;&gt; await&lt;/span&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt; openai.embeddings.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#6F42C1;--shiki-dark:#B392F0&quot;&gt;create&lt;/span&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt;({&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt;        model: &lt;/span&gt;&lt;span style=&quot;--shiki-light:#032F62;--shiki-dark:#9ECBFF&quot;&gt;&apos;text-embedding-ada-002&apos;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt;,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt;        input: formattedArticles[article].articleText,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt;        encoding_format: &lt;/span&gt;&lt;span style=&quot;--shiki-light:#032F62;--shiki-dark:#9ECBFF&quot;&gt;&apos;float&apos;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt;    });&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#6A737D;--shiki-dark:#6A737D&quot;&gt;    // Adde embedding data to JSON object&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt;    formattedArticles[article].embedding &lt;/span&gt;&lt;span style=&quot;--shiki-light:#D73A49;--shiki-dark:#F97583&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt; embedding&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;h3 id=&quot;3-save-your-embeddings-to-a-vector-database&quot;&gt;3. Save Your Embeddings to a Vector Database&lt;/h3&gt;
&lt;p&gt;Now that you have the embeddings, you can save them into a vector database, in my case I&apos;m using &lt;a href=&quot;https://www.pinecone.io/&quot;&gt;Pinecone&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Not only do you want to push the embeddings to Pinecone but, you also want to make sure the ID you&apos;re using can easily be cross-referenced (&lt;em&gt;pro tip: use the unique ID of the document from the CMS as the ID in Pinecone&lt;/em&gt;) and you may also want to include additional meta data about the document such as the category or tags from your CMS.&lt;/p&gt;
&lt;pre class=&quot;shiki shiki-themes github-light github-dark&quot; style=&quot;--shiki-light:#24292e;--shiki-dark:#e1e4e8;--shiki-light-bg:#fff;--shiki-dark-bg:#24292e&quot; tabindex=&quot;0&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#6A737D;--shiki-dark:#6A737D&quot;&gt;  // Chunk the articles&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#D73A49;--shiki-dark:#F97583&quot;&gt;  const&lt;/span&gt;&lt;span style=&quot;--shiki-light:#005CC5;--shiki-dark:#79B8FF&quot;&gt; chunkedArticles&lt;/span&gt;&lt;span style=&quot;--shiki-light:#D73A49;--shiki-dark:#F97583&quot;&gt; =&lt;/span&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt; formattedArticles.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#6F42C1;--shiki-dark:#B392F0&quot;&gt;reduce&lt;/span&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt;((&lt;/span&gt;&lt;span style=&quot;--shiki-light:#E36209;--shiki-dark:#FFAB70&quot;&gt;chunkedResults&lt;/span&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt;, &lt;/span&gt;&lt;span style=&quot;--shiki-light:#E36209;--shiki-dark:#FFAB70&quot;&gt;article&lt;/span&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt;, &lt;/span&gt;&lt;span style=&quot;--shiki-light:#E36209;--shiki-dark:#FFAB70&quot;&gt;index&lt;/span&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt;) &lt;/span&gt;&lt;span style=&quot;--shiki-light:#D73A49;--shiki-dark:#F97583&quot;&gt;=&gt;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt; { &lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#6A737D;--shiki-dark:#6A737D&quot;&gt;    // Set the chunk size&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#D73A49;--shiki-dark:#F97583&quot;&gt;    const&lt;/span&gt;&lt;span style=&quot;--shiki-light:#005CC5;--shiki-dark:#79B8FF&quot;&gt; chunkIndex&lt;/span&gt;&lt;span style=&quot;--shiki-light:#D73A49;--shiki-dark:#F97583&quot;&gt; =&lt;/span&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt; Math.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#6F42C1;--shiki-dark:#B392F0&quot;&gt;floor&lt;/span&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt;(index&lt;/span&gt;&lt;span style=&quot;--shiki-light:#D73A49;--shiki-dark:#F97583&quot;&gt;/&lt;/span&gt;&lt;span style=&quot;--shiki-light:#005CC5;--shiki-dark:#79B8FF&quot;&gt;50&lt;/span&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt;);&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt;    &lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#6A737D;--shiki-dark:#6A737D&quot;&gt;    // Start a new chunk&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#D73A49;--shiki-dark:#F97583&quot;&gt;    if&lt;/span&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#D73A49;--shiki-dark:#F97583&quot;&gt;!&lt;/span&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt;chunkedResults[chunkIndex]) {&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt;        chunkedResults[chunkIndex] &lt;/span&gt;&lt;span style=&quot;--shiki-light:#D73A49;--shiki-dark:#F97583&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt; [];&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt;    }&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt;    &lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#6A737D;--shiki-dark:#6A737D&quot;&gt;    // Add the article to the chunk&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt;    chunkedResults[chunkIndex].&lt;/span&gt;&lt;span style=&quot;--shiki-light:#6F42C1;--shiki-dark:#B392F0&quot;&gt;push&lt;/span&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt;(article)&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt;    &lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#D73A49;--shiki-dark:#F97583&quot;&gt;    return&lt;/span&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt; chunkedResults&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt;}, []);&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#6A737D;--shiki-dark:#6A737D&quot;&gt;// Target a Pinecone index&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#D73A49;--shiki-dark:#F97583&quot;&gt;const&lt;/span&gt;&lt;span style=&quot;--shiki-light:#005CC5;--shiki-dark:#79B8FF&quot;&gt; pineconeIndex&lt;/span&gt;&lt;span style=&quot;--shiki-light:#D73A49;--shiki-dark:#F97583&quot;&gt; =&lt;/span&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt; pinecone.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#6F42C1;--shiki-dark:#B392F0&quot;&gt;index&lt;/span&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#005CC5;--shiki-dark:#79B8FF&quot;&gt;PINECONE_INDEX&lt;/span&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt;);&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#6A737D;--shiki-dark:#6A737D&quot;&gt;// Send the chunks to Pinecone&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#D73A49;--shiki-dark:#F97583&quot;&gt;for&lt;/span&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt; (&lt;/span&gt;&lt;span style=&quot;--shiki-light:#D73A49;--shiki-dark:#F97583&quot;&gt;const&lt;/span&gt;&lt;span style=&quot;--shiki-light:#005CC5;--shiki-dark:#79B8FF&quot;&gt; chunk&lt;/span&gt;&lt;span style=&quot;--shiki-light:#D73A49;--shiki-dark:#F97583&quot;&gt; of&lt;/span&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt; chunkedArticles) {&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#6A737D;--shiki-dark:#6A737D&quot;&gt;    // Create an empty embeddings array&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#D73A49;--shiki-dark:#F97583&quot;&gt;    let&lt;/span&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt; embeddings &lt;/span&gt;&lt;span style=&quot;--shiki-light:#D73A49;--shiki-dark:#F97583&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt; [];&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#6A737D;--shiki-dark:#6A737D&quot;&gt;    // Push the embeddings of each article to the embeddings&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#D73A49;--shiki-dark:#F97583&quot;&gt;    for&lt;/span&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt; (&lt;/span&gt;&lt;span style=&quot;--shiki-light:#D73A49;--shiki-dark:#F97583&quot;&gt;const&lt;/span&gt;&lt;span style=&quot;--shiki-light:#005CC5;--shiki-dark:#79B8FF&quot;&gt; article&lt;/span&gt;&lt;span style=&quot;--shiki-light:#D73A49;--shiki-dark:#F97583&quot;&gt; of&lt;/span&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt; chunk) {&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt;        embeddings.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#6F42C1;--shiki-dark:#B392F0&quot;&gt;push&lt;/span&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt;({&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt;            id: article[&lt;/span&gt;&lt;span style=&quot;--shiki-light:#032F62;--shiki-dark:#9ECBFF&quot;&gt;&apos;wp:post_id&apos;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt;]._text,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt;            values: article.embedding.data[&lt;/span&gt;&lt;span style=&quot;--shiki-light:#005CC5;--shiki-dark:#79B8FF&quot;&gt;0&lt;/span&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt;].embedding,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt;            metadata: {&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt;                category: article.category._cdata.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#6F42C1;--shiki-dark:#B392F0&quot;&gt;toLowerCase&lt;/span&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt;()&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt;            }&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt;        });&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt;    }&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#6A737D;--shiki-dark:#6A737D&quot;&gt;    // Push embedding to Pinecone&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#D73A49;--shiki-dark:#F97583&quot;&gt;    await&lt;/span&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt; pineconeIndex.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#6F42C1;--shiki-dark:#B392F0&quot;&gt;upsert&lt;/span&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt;(embeddings);&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#6A737D;--shiki-dark:#6A737D&quot;&gt;    // Provide confirmation of saving&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt;    console.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#6F42C1;--shiki-dark:#B392F0&quot;&gt;log&lt;/span&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#032F62;--shiki-dark:#9ECBFF&quot;&gt;`Pushed ${&lt;/span&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt;chunk&lt;/span&gt;&lt;span style=&quot;--shiki-light:#032F62;--shiki-dark:#9ECBFF&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#005CC5;--shiki-dark:#79B8FF&quot;&gt;length&lt;/span&gt;&lt;span style=&quot;--shiki-light:#032F62;--shiki-dark:#9ECBFF&quot;&gt;} article embeddings to Pinecone`&lt;/span&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt;);&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt;}&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#6A737D;--shiki-dark:#6A737D&quot;&gt;// Save data to a JSON file&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt;fs.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#6F42C1;--shiki-dark:#B392F0&quot;&gt;writeFileSync&lt;/span&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#032F62;--shiki-dark:#9ECBFF&quot;&gt;&apos;./output/article-embeddings.json&apos;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt;, &lt;/span&gt;&lt;span style=&quot;--shiki-light:#005CC5;--shiki-dark:#79B8FF&quot;&gt;JSON&lt;/span&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#6F42C1;--shiki-dark:#B392F0&quot;&gt;stringify&lt;/span&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt;(formattedArticles));&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;h3 id=&quot;4-compare-your-link-target-embedding-with-your-vector-database&quot;&gt;4. Compare Your Link Target Embedding with Your Vector Database&lt;/h3&gt;
&lt;p&gt;This is where the rubber finally hits the road, you then take the WordPress post ID, which should also be the ID of the document in Pinecone, of the URL you&apos;re trying to find links (I&apos;ll call this the target document) to and you request that Pinecone provides you with documents that are similar to it. In my case I am requesting the top 50 similar documents.&lt;/p&gt;
&lt;p&gt;Pinecone will then send you a whole slew of results back with a score between 0 and 1, where 0 is irrelevant and 1 is identical.&lt;/p&gt;
&lt;p&gt;The list is great but, next we need to filter them do to actual opportunities. I do this by:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Excluding the target document itself&lt;/li&gt;
&lt;li&gt;Removing results that are below a certain score threshold (I recommend above a 0.7 at the minimum)&lt;/li&gt;
&lt;li&gt;Removing results that are already linking to your target document&lt;/li&gt;
&lt;li&gt;Cleaning up the results into something that is human readable&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;// Get matched opportunities from Pinecone
let opps = await pinecone.index(PINECONE_INDEX).query({ topK: 50, id: TARGET_ARTICLE_ID})

// Get Target Article Info
let targetArticleInfo = formattedArticles.filter(function(target) {
    return target[&apos;wp:post_id&apos;]._text === TARGET_ARTICLE_ID
})

// Filter
let filteredOpps = opps.matches.filter(function(opp) {
    // Remove target article &amp;#x26; articles below the scoreThreshold
    return opp.id !== TARGET_ARTICLE_ID &amp;#x26;&amp;#x26; opp.score &gt;= SCORE_THRESHOLD;
})

// Merge Pinecone Results + WP Data
let finalOpp = filteredOpps
    // Remove the target article from the opps
    .filter(opp =&gt; formattedArticles.some(wp =&gt; wp[&apos;wp:post_id&apos;]._text === opp.id))
    // Add WP link, title and HTML
    .map(finalOpp =&gt; ({
        targetUrl: targetArticleInfo[0].link._text,
        ...finalOpp,
        link: formattedArticles.find( wp =&gt; wp[&apos;wp:post_id&apos;]._text === finalOpp.id).link._text,
        category: formattedArticles.find( wp =&gt; wp[&apos;wp:post_id&apos;]._text === finalOpp.id).category ? formattedArticles.find( wp =&gt; wp[&apos;wp:post_id&apos;]._text === finalOpp.id).category._cdata : &apos;&apos;,
        title: formattedArticles.find( wp =&gt; wp[&apos;wp:post_id&apos;]._text === finalOpp.id).title._cdata,
        htmlContent: formattedArticles.find( wp =&gt; wp[&apos;wp:post_id&apos;]._text === finalOpp.id)[&apos;content:encoded&apos;]._cdata
    }))
    // Remove articles already linking to target
    .filter(finalOpp =&gt; {
        return !finalOpp.htmlContent.includes(targetArticleInfo[0].link._text)
    })
    // clean up the opps for CSV output
    .filter(finalOpps =&gt; {
        delete finalOpps.htmlContent;
        delete finalOpps.values;
        delete finalOpps.sparseValues;
        delete finalOpps.metadata;
        return true;
    });
&lt;/code&gt;&lt;/pre&gt;
&lt;h3 id=&quot;5-profit&quot;&gt;5. Profit!&lt;/h3&gt;
&lt;p&gt;Last but, most definitely not least is saving the link opportunities into a nice and tidy CSV file for you to do a final manually spot check and start building those internal links to:&lt;/p&gt;
&lt;pre class=&quot;shiki shiki-themes github-light github-dark&quot; style=&quot;--shiki-light:#24292e;--shiki-dark:#e1e4e8;--shiki-light-bg:#fff;--shiki-dark-bg:#24292e&quot; tabindex=&quot;0&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#6A737D;--shiki-dark:#6A737D&quot;&gt;// Save output as CSV&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt;fs.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#6F42C1;--shiki-dark:#B392F0&quot;&gt;writeFileSync&lt;/span&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#032F62;--shiki-dark:#9ECBFF&quot;&gt;&apos;./output/opps-&apos;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#D73A49;--shiki-dark:#F97583&quot;&gt;+&lt;/span&gt;&lt;span style=&quot;--shiki-light:#005CC5;--shiki-dark:#79B8FF&quot;&gt;TARGET_ARTICLE_ID&lt;/span&gt;&lt;span style=&quot;--shiki-light:#D73A49;--shiki-dark:#F97583&quot;&gt;+&lt;/span&gt;&lt;span style=&quot;--shiki-light:#032F62;--shiki-dark:#9ECBFF&quot;&gt;&apos;.csv&apos;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt;, &lt;/span&gt;&lt;span style=&quot;--shiki-light:#D73A49;--shiki-dark:#F97583&quot;&gt;await&lt;/span&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt; json2csv.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#6F42C1;--shiki-dark:#B392F0&quot;&gt;json2csv&lt;/span&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt;(finalOpp));&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#6A737D;--shiki-dark:#6A737D&quot;&gt;// Send success message&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt;console.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#6F42C1;--shiki-dark:#B392F0&quot;&gt;log&lt;/span&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#032F62;--shiki-dark:#9ECBFF&quot;&gt;`There were ${&lt;/span&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt;finalOpp&lt;/span&gt;&lt;span style=&quot;--shiki-light:#032F62;--shiki-dark:#9ECBFF&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#005CC5;--shiki-dark:#79B8FF&quot;&gt;length&lt;/span&gt;&lt;span style=&quot;--shiki-light:#032F62;--shiki-dark:#9ECBFF&quot;&gt;} link opportunities found for the URL ${&lt;/span&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt; targetArticleInfo&lt;/span&gt;&lt;span style=&quot;--shiki-light:#032F62;--shiki-dark:#9ECBFF&quot;&gt;[&lt;/span&gt;&lt;span style=&quot;--shiki-light:#005CC5;--shiki-dark:#79B8FF&quot;&gt;0&lt;/span&gt;&lt;span style=&quot;--shiki-light:#032F62;--shiki-dark:#9ECBFF&quot;&gt;].&lt;/span&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt;link&lt;/span&gt;&lt;span style=&quot;--shiki-light:#032F62;--shiki-dark:#9ECBFF&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt;_text&lt;/span&gt;&lt;span style=&quot;--shiki-light:#032F62;--shiki-dark:#9ECBFF&quot;&gt;}`&lt;/span&gt;&lt;span style=&quot;--shiki-light:#24292E;--shiki-dark:#E1E4E8&quot;&gt;);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;h2 id=&quot;closing-thoughts&quot;&gt;Closing Thoughts&lt;/h2&gt;
&lt;p&gt;With the testing that I conducted, I found that accuracy to be the biggest issue here. In that internal linking opportunities with high score thresholds were not topcially relevant while some opportunities that were flagged had higher relevancy.&lt;/p&gt;
&lt;p&gt;A few ideas on improving the accuracy of your results could do implementing:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://docs.pinecone.io/docs/metadata-filtering&quot;&gt;Filtering results by meta data&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Vectorizing and searching by page title rather than body content&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://docs.pinecone.io/docs/weighting-sparse-and-dense-vectors&quot;&gt;Using hybrid search and sparse vectors using weights&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I hope you enjoyed the walkthrough and it got your gears turning on how you can use embeddings and vector search in your day to day SEO tasks.&lt;/p&gt;</content:encoded></item><item><title>How to Save GoogleBot &amp; BingBot IP Addresses to BigQuery</title><link>https://jordanchoo.com/blog/how-to-save-googlebot-bingbot-ip-addresses-to-bigquery/</link><guid isPermaLink="true">https://jordanchoo.com/blog/how-to-save-googlebot-bingbot-ip-addresses-to-bigquery/</guid><description>Big Crawler IPs is part of a much larger initiative of providing SEOs and marketers with open source data warehousing tools</description><pubDate>Fri, 30 Jun 2023 19:10:07 GMT</pubDate><content:encoded>&lt;p&gt;*Big Crawler IPs is part of a much larger initiative of providing SEOs and marketers with open source data warehousing tools *&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;As I dive deeper into the world of SEO whether it be with my own personal projects or with the clients that I work with, I am realizing that the need to have a data warehouse has become more and more important.&lt;/p&gt;
&lt;p&gt;One data source that I&apos;ve found to be a big pain to consistently have are log files as it typically require sever level access and a developer.&lt;/p&gt;
&lt;p&gt;Thankfully, there is a super hand-dandy tool called &lt;a href=&quot;https://logflare.app/&quot;&gt;LogFlare&lt;/a&gt; that sits on top of &lt;a href=&quot;https://cloudflare.com/&quot;&gt;CloudFlare&lt;/a&gt; which takes care of all of the heavy lifting when it comes to &lt;a href=&quot;https://docs.logflare.app/backends/bigquery/&quot;&gt;collecting and storing log data in a BigQuery&lt;/a&gt; and thankfully BigQuery is my data warehouse of choice.&lt;/p&gt;
&lt;p&gt;As amazing as LogFlare is, it&apos;s simply a firehose of log data into your warehouse (the extracting and loading part of &lt;a href=&quot;https://en.wikipedia.org/wiki/Extract,_load,_transform&quot;&gt;ELT&lt;/a&gt;). There isn&apos;t any filtering or transformation that happens as the onus is on you (as it should be).&lt;/p&gt;
&lt;h2 id=&quot;why-build-this&quot;&gt;Why Build This&lt;/h2&gt;
&lt;p&gt;So if you&apos;re trying to build out an SEO data warehouse one of the first filtering steps you should be taking with log files is &quot;authenticating&quot; the data to make sure that it is actually &lt;a href=&quot;https://developers.google.com/static/search/apis/ipranges/googlebot.json&quot;&gt;GoogleBot&lt;/a&gt; or &lt;a href=&quot;https://www.bing.com/toolbox/bingbot.json&quot;&gt;BingBot&lt;/a&gt; crawling your site versus a tool such as SiteBulb or ScreamingFrog.&lt;/p&gt;
&lt;p&gt;To do this you have to rely on the IP address rather than the User-Agent. Thankfully both GoogleBot and BingBot provide you with a list of IP addresses.&lt;/p&gt;
&lt;p&gt;So how do you get these IP addresses into your warehouse?&lt;/p&gt;
&lt;p&gt;Well that is where &lt;a href=&quot;https://github.com/JordanChoo/big-crawler-ips&quot;&gt;Big Crawler IPs&lt;/a&gt; come into play.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;&lt;a href=&quot;https://github.com/JordanChoo/big-crawler-ips&quot;&gt;Download the code on GitHub Here&lt;/a&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id=&quot;how-it-works&quot;&gt;How It Works&lt;/h2&gt;
&lt;p&gt;The code lives within a &lt;a href=&quot;https://cloud.google.com/functions&quot;&gt;Google Cloud Function&lt;/a&gt; which is periodically triggered by a &lt;a href=&quot;https://cloud.google.com/scheduler&quot;&gt;Cloud Scheduler&lt;/a&gt; HTTP request.&lt;/p&gt;
&lt;p&gt;Once the Cloud Function is triggered, the official &lt;a href=&quot;https://developers.google.com/static/search/apis/ipranges/googlebot.json&quot;&gt;GoogleBot&lt;/a&gt; and &lt;a href=&quot;https://www.bing.com/toolbox/bingbot.json&quot;&gt;BingBot&lt;/a&gt; IP address JSON files are read and then then cross referenced with the IPs already in BigQuery and then saves the missing ones. If an an IP address is missing from BigQuery it is then added to the table.&lt;/p&gt;
&lt;h2 id=&quot;how-to-deploy-your-own&quot;&gt;How To Deploy Your Own&lt;/h2&gt;
&lt;p&gt;&lt;a href=&quot;https://youtu.be/la8kppC8Fwk&quot;&gt;youtube: https://youtu.be/la8kppC8Fwk&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Hope you find this tool handy and if you have any feedback feel free to reach out on Twitter (&lt;a href=&quot;https://twitter.com/JordanChoo&quot;&gt;@JordanChoo&lt;/a&gt;)&lt;/p&gt;</content:encoded></item></channel></rss>