Get posts sent to your inbox:
A roadmap for modern digital government:
Our plan to build a modern digital government that works for you – making life easier, driving growth and delivering smarter, more efficient public services
An online notebook
Get posts sent to your inbox:
An online notebook
A roadmap for modern digital government:
Our plan to build a modern digital government that works for you – making life easier, driving growth and delivering smarter, more efficient public services
Let’s see how long I can keep this up for!
This week’s worky highlights:
Not really work stuff:
Media consumption:
Fixed an issue in my Tidycal appointment booking thingy, where Teams meetings were being booked but no meeting link provided. It was a simple case of disconnecting the accounts and then reconnecting them (yup, turn it off and on again, basically). Not a biggie but I’ve had to do it several times now, so might look for another solution.
Wow – Microsoft Copilot blamed by Chief Constable for Maccabi Tel Aviv ban:
The incident and subsequent review have reignited debates about police decision-making, community engagement, and the balance between public safety and fairness in policing high-profile events. It has also highlighted the hallucination risks posed by AI technologies such as Microsoft Copilot if not used properly. West Midlands Police has pledged to improve its processes.
So much to unpack there.
Now I’m wondering whether I should just put everything pre-2026 into an ‘Archive’ category and just start afresh from this year…
A though just occurred – part of the reason why the categories are in a mess is probably because tags didn’t exist in WordPress until 2007, so multiple categories on posts was a regular thing in the early days.
I have been frustrated with the state of the categories on this blog for some years now. That’s right – years. This site has been going in one form or another for over 20 years – the first post being in 2004 – and there’re over 3,000 posts on here in total.
Over that time, category use in WordPress has been variable. This has resulted in a morass of categories, which I am never quite sure which to choose when posting.
So, I wanted to find a way to recategorise everything in a reasonably efficient way, and thought with a mix of Google’s Gemini LLM tool and some choice WordPress plugins, I might be able to get it done.
First step – after backing everything up of course – was to remove all the categories I am currently using (there were hundreds). To do this I used the free WP Bulk Delete plugin, which was very easy. I then deactivated and uninstalled the plugin to keep things neat.
Next, I used the free version of WP All Export to produce a CSV file of every post (the post ID, title, and content) on the blog. That done, again the plugin was deactivated and uninstalled.
Now I needed to come up with a set of categories to cover the content in the blog. Being lazy, I decided to ask Gemini to do it for me. Here’s the prompt, which I got to after a bit of back and forth with Gemini:
**Role:** You are an expert WordPress Archivist and Data Taxonomist.
**Context:** I have uploaded a CSV containing 3,031 blog posts from a 20-year archive. The current categorization is a mess. I need to completely restructure the site content into a clean, modern, and mutually exclusive taxonomy.
**Goal:** We need to assign one single, perfect category to every post in this CSV so I can import it back into WordPress.
—
1. Scan the “Title” and “Content/Excerpt” columns of the uploaded CSV to understand the breadth of topics covered over the last 20 years.
2. Propose a list of **12-15 high-level categories**.
– These must be mutually exclusive (no overlap).
– They must be comprehensive (cover all topics in the file).
– Use simple, standard naming conventions (e.g., “Technology,” “Travel,” “Personal,” not “My Cool Tech Stuff”).3. Output this proposed list to me with a brief explanation of what belongs in each.
This produced 12 suggested categories, which seemed fine to me.
Content: Public sector digital transformation, local government (councils), civic tech, and policy discussions.
Content: Analysis of platforms (Twitter, Facebook, etc.), online community management, and social networking trends.
Content: General tech news, software reviews (e.g., Zoho, Gmail), hardware, and internet tools.
Content: Web design, development, specific WordPress plugins/themes, and CMS discussions.
Content: The practice of blogging, writing, online journalism, and digital media trends.
Content: Open government data, freedom of information, transparency, and data standards (“Mash the State”).
Content: Workflow tips, remote work, collaboration tools, and personal effectiveness.
Content: Conference write-ups, presentation slides, workshops, and event announcements.
Content: Your archive contains hundreds of “Links for [Date]” and “Bookmarks” posts. These are distinct resources and deserve their own home.
Content: Short, micro-blog style updates, status messages, and quick thoughts (e.g., “Five for Friday”).
Content: Specific reviews of books, products, or services that don’t fit strictly into “Technology.”
Content: Life updates, off-topic reflections (e.g., “Woken by a cat”), and general musings.
Now I needed to match each of my 3,000+ posts with one of these 12 categories and list them in a CSV file.
Gemini suggested a Python script, so I fired up a terminal window and ran this script:
import pandas as pd
# 1. Load your CSV
# Make sure the filename matches your export exactly
df = pd.read_csv(‘Posts-Export-2026-January-15-1048.csv’)# 2. Define the Categories and Keywords
categories = {
“Curated Links”: {
“keywords”: [“links for”, “bookmarks”, “reading list”, “worth reading”, “link dump”, “daily links”],
“weight”: 10
},
“Digital Government”: {
“keywords”: [“council”, “localgov”, “gov”, “civic”, “policy”, “public sector”, “democracy”, “e-gov”, “citizen”, “government”, “public service”, “whitehall”, “civil service”],
“weight”: 1
},
“Social Media & Community”: {
“keywords”: [“twitter”, “facebook”, “social media”, “community”, “network”, “platform”, “linkedin”, “instagram”, “social network”, “online community”],
“weight”: 1
},
“Web & WordPress”: {
“keywords”: [“wordpress”, “plugin”, “theme”, “css”, “html”, “web design”, “cms”, “blog engine”, “php”, “javascript”, “code”, “developer”, “site”, “website”],
“weight”: 1
},
“Technology & Software”: {
“keywords”: [“google”, “apple”, “software”, “app”, “iphone”, “mac”, “tech”, “tool”, “hardware”, “zoho”, “gmail”, “browser”, “firefox”, “chrome”, “device”, “mobile”, “internet”],
“weight”: 1
},
“Open Data & Transparency”: {
“keywords”: [“open data”, “foi”, “transparency”, “mashup”, “dataset”, “freedom of information”, “data”, “ckan”, “statistics”],
“weight”: 1
},
“Productivity & Work”: {
“keywords”: [“productivity”, “work”, “gtd”, “email”, “inbox”, “office”, “remote”, “collaboration”, “workflow”, “management”, “meeting”, “career”],
“weight”: 1
},
“Events & Speaking”: {
“keywords”: [“conference”, “camp”, “barcamp”, “presentation”, “slides”, “event”, “talk”, “session”, “meetup”, “workshop”, “govcamp”, “speaking”],
“weight”: 1
},
“Blogging & Content”: {
“keywords”: [“blogging”, “writing”, “journalism”, “media”, “post”, “content”, “publish”, “blogger”, “feed”, “rss”],
“weight”: 1
},
“Reviews”: {
“keywords”: [“review”, “book review”, “product review”, “thoughts on”, “impression”],
“weight”: 2
},
“Daily Notes”: {
“keywords”: [“five for friday”, “update”, “status”, “note”, “snippet”, “aside”, “quick update”],
“weight”: 1
},
“Personal”: {
“keywords”: [“holiday”, “cat”, “home”, “life”, “family”, “personal”, “thoughts”, “rant”, “weekend”, “music”, “film”],
“weight”: 1
}
}def categorize_post(row):
title = str(row[‘Title’]).lower()
content = str(row[‘Content’]).lower()# Priority Rule: Curated Links
if any(k in title for k in categories[“Curated Links”][“keywords”]):
return “Curated Links”# Priority Rule: Daily Notes (Short content check)
if len(content) < 200 and "href" not in content and len(title) < 30: return "Daily Notes" # Scoring Logic scores = {cat: 0 for cat in categories} def score_text(text, multiplier=1): for cat, data in categories.items(): for keyword in data['keywords']: if keyword in text: scores[cat] += (1 * multiplier * data.get('weight', 1)) # Title gets 3x weight score_text(title, multiplier=3) # Check first 1000 chars of content score_text(content[:1000], multiplier=1) best_cat = max(scores, key=scores.get) if scores[best_cat] == 0: return "Personal" return best_cat # 3. Apply the function df['New_Category'] = df.apply(categorize_post, axis=1) # 4. Save the result output_filename = 'Categorized_Posts_Archive.csv' df[['id', 'New_Category']].to_csv(output_filename, index=False) print(f"Success! File saved as: {output_filename}")
This worked as expected and spat out a new CSV file listing post ids next to the new category.
Now I needed to get WordPress to look at this CSV file and update the content database so that each post is assigned to the right new category. To do that I used WP All Import, which I paid for – although I’m not sure I actually needed to.
This stage of the process was the most frustrating as WP All Import is necessarily quite fiddly, and it took about 5 goes to get it to do what I wanted. But I got there in the end.
Well, in a sense, success! In the space of less than an hour I have recategorised over 3,000 posts on my blog into just 12 different categories.
On the other hand… it turns out I hate these categories and the ways Gemini associated them with posts is often really stupid.
Using Gemini made this possible. On my own, I couldn’t have done it. This is 100% true of the Python scripting, which I have zero knowledge of, but also it would have taken me ages to figure out the steps I needed to take (am sure they seem obvious to others!).
However, I will inevitably redo the process, with categories and definitions I have produced myself, to try and get better results.
For now, though, it’ll do.
Some major reworking of this blog over the last few days. Dramatically simplified things – sadly the micropost format that Steph helped me build didn’t make the cut.
Instead I am making use of the default WordPress ‘asides’ post format to add what used to be microposts. This has helped me cut down the number of plugins running on the site significantly.
Also, I removed Google Analytics because I never looked at it, and don’t really care how many people read this anyway.
I also replaced the theme, switching from GeneratePress to Blocksy – which I know is a bit bloated but which I understand and can oo things with.
I’ve added a bit to the About page to describe how the blog works, plugins and custom code used, and so on, if you are sufficiently interested to look.
Roger Swannell – Getting agile governance right:
In a traditional waterfall or stage-gated development process quality assurance and approval checks happen towards the end of the work. It makes sense if you’re optimising for efficiency as all the quality check governance happens when the development is as close to it’s live and finished state as possible. But it’s based on the assumption that it’s possible to know ahead of the development work all the possible considerations and implications, which approval checks can refer to and confirm met or not.
The agile perspective is that its not possible to know all those implications ahead of doing the development work and so the better approach is to get smaller feedback more regularly and respond to it more quickly. This is better than waiting until near the end because changes are easier to make whilst development work is in progress. This approach optimises for effectiveness where getting it right is more important than doing it quickly.
Wise words as always from Catherine Howe – Change is a layered thing:
There is an alchemy to taking stuff you have already and turning it into something different but I think that’s the essence of organisational change. You are always working with the fabric and nature of the organisation as it is in order to help it become something renewed.
Terrific list of recommended posts from Steve Messer.