Recategorising over 3,000 blog posts

I have been frustrated with the state of the categories on this blog for some years now. That’s right – years. This site has been going in one form or another for over 20 years – the first post being in 2004 – and there’re over 3,000 posts on here in total.

Over that time, category use in WordPress has been variable. This has resulted in a morass of categories, which I am never quite sure which to choose when posting.

So, I wanted to find a way to recategorise everything in a reasonably efficient way, and thought with a mix of Google’s Gemini LLM tool and some choice WordPress plugins, I might be able to get it done.

Delete the old categories

First step – after backing everything up of course – was to remove all the categories I am currently using (there were hundreds). To do this I used the free WP Bulk Delete plugin, which was very easy. I then deactivated and uninstalled the plugin to keep things neat.

Export post data

Next, I used the free version of WP All Export to produce a CSV file of every post (the post ID, title, and content) on the blog. That done, again the plugin was deactivated and uninstalled.

Define the new categories

Now I needed to come up with a set of categories to cover the content in the blog. Being lazy, I decided to ask Gemini to do it for me. Here’s the prompt, which I got to after a bit of back and forth with Gemini:

**Role:** You are an expert WordPress Archivist and Data Taxonomist.

**Context:** I have uploaded a CSV containing 3,031 blog posts from a 20-year archive. The current categorization is a mess. I need to completely restructure the site content into a clean, modern, and mutually exclusive taxonomy.

**Goal:** We need to assign one single, perfect category to every post in this CSV so I can import it back into WordPress.

1. Scan the “Title” and “Content/Excerpt” columns of the uploaded CSV to understand the breadth of topics covered over the last 20 years.

2. Propose a list of **12-15 high-level categories**.
– These must be mutually exclusive (no overlap).
– They must be comprehensive (cover all topics in the file).
– Use simple, standard naming conventions (e.g., “Technology,” “Travel,” “Personal,” not “My Cool Tech Stuff”).

3. Output this proposed list to me with a brief explanation of what belongs in each.

This produced 12 suggested categories, which seemed fine to me.

  • Digital Government

    Content: Public sector digital transformation, local government (councils), civic tech, and policy discussions.

  • Social Media & Community

    Content: Analysis of platforms (Twitter, Facebook, etc.), online community management, and social networking trends.

  • Technology & Software

    Content: General tech news, software reviews (e.g., Zoho, Gmail), hardware, and internet tools.

  • Web & WordPress

    Content: Web design, development, specific WordPress plugins/themes, and CMS discussions.

  • Blogging & Content

    Content: The practice of blogging, writing, online journalism, and digital media trends.

  • Open Data & Transparency

    Content: Open government data, freedom of information, transparency, and data standards (“Mash the State”).

  • Productivity & Work

    Content: Workflow tips, remote work, collaboration tools, and personal effectiveness.

  • Events & Speaking

    Content: Conference write-ups, presentation slides, workshops, and event announcements.

  • Links

    Content: Your archive contains hundreds of “Links for [Date]” and “Bookmarks” posts. These are distinct resources and deserve their own home.

  • Daily Notes

    Content: Short, micro-blog style updates, status messages, and quick thoughts (e.g., “Five for Friday”).

  • Reviews

    Content: Specific reviews of books, products, or services that don’t fit strictly into “Technology.”

  • Personal

    Content: Life updates, off-topic reflections (e.g., “Woken by a cat”), and general musings.

Match posts to new categories

Now I needed to match each of my 3,000+ posts with one of these 12 categories and list them in a CSV file.

Gemini suggested a Python script, so I fired up a terminal window and ran this script:

import pandas as pd

# 1. Load your CSV
# Make sure the filename matches your export exactly
df = pd.read_csv(‘Posts-Export-2026-January-15-1048.csv’)

# 2. Define the Categories and Keywords
categories = {
“Curated Links”: {
“keywords”: [“links for”, “bookmarks”, “reading list”, “worth reading”, “link dump”, “daily links”],
“weight”: 10
},
“Digital Government”: {
“keywords”: [“council”, “localgov”, “gov”, “civic”, “policy”, “public sector”, “democracy”, “e-gov”, “citizen”, “government”, “public service”, “whitehall”, “civil service”],
“weight”: 1
},
“Social Media & Community”: {
“keywords”: [“twitter”, “facebook”, “social media”, “community”, “network”, “platform”, “linkedin”, “instagram”, “social network”, “online community”],
“weight”: 1
},
“Web & WordPress”: {
“keywords”: [“wordpress”, “plugin”, “theme”, “css”, “html”, “web design”, “cms”, “blog engine”, “php”, “javascript”, “code”, “developer”, “site”, “website”],
“weight”: 1
},
“Technology & Software”: {
“keywords”: [“google”, “apple”, “software”, “app”, “iphone”, “mac”, “tech”, “tool”, “hardware”, “zoho”, “gmail”, “browser”, “firefox”, “chrome”, “device”, “mobile”, “internet”],
“weight”: 1
},
“Open Data & Transparency”: {
“keywords”: [“open data”, “foi”, “transparency”, “mashup”, “dataset”, “freedom of information”, “data”, “ckan”, “statistics”],
“weight”: 1
},
“Productivity & Work”: {
“keywords”: [“productivity”, “work”, “gtd”, “email”, “inbox”, “office”, “remote”, “collaboration”, “workflow”, “management”, “meeting”, “career”],
“weight”: 1
},
“Events & Speaking”: {
“keywords”: [“conference”, “camp”, “barcamp”, “presentation”, “slides”, “event”, “talk”, “session”, “meetup”, “workshop”, “govcamp”, “speaking”],
“weight”: 1
},
“Blogging & Content”: {
“keywords”: [“blogging”, “writing”, “journalism”, “media”, “post”, “content”, “publish”, “blogger”, “feed”, “rss”],
“weight”: 1
},
“Reviews”: {
“keywords”: [“review”, “book review”, “product review”, “thoughts on”, “impression”],
“weight”: 2
},
“Daily Notes”: {
“keywords”: [“five for friday”, “update”, “status”, “note”, “snippet”, “aside”, “quick update”],
“weight”: 1
},
“Personal”: {
“keywords”: [“holiday”, “cat”, “home”, “life”, “family”, “personal”, “thoughts”, “rant”, “weekend”, “music”, “film”],
“weight”: 1
}
}

def categorize_post(row):
title = str(row[‘Title’]).lower()
content = str(row[‘Content’]).lower()

# Priority Rule: Curated Links
if any(k in title for k in categories[“Curated Links”][“keywords”]):
return “Curated Links”

# Priority Rule: Daily Notes (Short content check)
if len(content) < 200 and "href" not in content and len(title) < 30: return "Daily Notes" # Scoring Logic scores = {cat: 0 for cat in categories} def score_text(text, multiplier=1): for cat, data in categories.items(): for keyword in data['keywords']: if keyword in text: scores[cat] += (1 * multiplier * data.get('weight', 1)) # Title gets 3x weight score_text(title, multiplier=3) # Check first 1000 chars of content score_text(content[:1000], multiplier=1) best_cat = max(scores, key=scores.get) if scores[best_cat] == 0: return "Personal" return best_cat # 3. Apply the function df['New_Category'] = df.apply(categorize_post, axis=1) # 4. Save the result output_filename = 'Categorized_Posts_Archive.csv' df[['id', 'New_Category']].to_csv(output_filename, index=False) print(f"Success! File saved as: {output_filename}")

This worked as expected and spat out a new CSV file listing post ids next to the new category.

Apply new categories to posts in WordPress

Now I needed to get WordPress to look at this CSV file and update the content database so that each post is assigned to the right new category. To do that I used WP All Import, which I paid for – although I’m not sure I actually needed to.

This stage of the process was the most frustrating as WP All Import is necessarily quite fiddly, and it took about 5 goes to get it to do what I wanted. But I got there in the end.

The result

Well, in a sense, success! In the space of less than an hour I have recategorised over 3,000 posts on my blog into just 12 different categories.

On the other hand… it turns out I hate these categories and the ways Gemini associated them with posts is often really stupid.

The learning

Using Gemini made this possible. On my own, I couldn’t have done it. This is 100% true of the Python scripting, which I have zero knowledge of, but also it would have taken me ages to figure out the steps I needed to take (am sure they seem obvious to others!).

However, I will inevitably redo the process, with categories and definitions I have produced myself, to try and get better results.

For now, though, it’ll do.

Leave a Reply

Your email address will not be published. Required fields are marked *