The (MVP) making of PodHistory

In this post I share my process for building the MVP for PodHistory, from idea to launched product and what my next steps will be.

The idea

At the heart of this idea was the simple question: When did that podcast talk about that topic before?

In the past I asked myself this question every now and again and also noticed hosts of many podcasts linking back to old episodes for the same reasons.

However I was never happy with the ability to find the exact episode, if there are show notes you may find it that way, otherwise it’s more or less guesswork and listening into many episodes to eventually find the right one.

Building the basics

The building phase started more or less by accident when I was playing with ASR models1 to transcribe my own audio notes.

Seeing the (at least for me) improved capabilities of these models to create “good enough” transcripts I quickly built out a simple database structure using my framework of choice, Django.

The first version simply took the generated transcripts, loaded them into the database and made them searchable in the Django admin interface.

From there I built out a minimal user interface with simple, server side rendered, HTML and a minimal database structure to represent Podcasts, their episodes and the actual transcripts.

Taking code from some of my previous projects for things like handling multiple domains, RSS feed parsing and a simple job queue, I built out a working MVP that takes a podcasts RSS feed and fully automatically ingests, transcribes and presents episodes.

Whipping up a quick landing page design, using with one of it’s great premium templates, was relatively easy.

The hardest part was deciding on a good way to define the benefits of this service and the actual pricing. I’m not fully happy with both and will keep learning and tweaking them throughout the launch phase.

Things I did not build

Not building things is as important as building things in an MVP, these are some of the bigger things I didn’t build and my reasoning behind it.

A payment processor integration:

I can do that manually in the payment processors dashboard for now

User accounts & self service for podcasters to change things:

Most data is automatically parsed from podcast feeds and anything beyond that I will have to do manually. This is worth the time saved from not having to implement users and permissions in my opinion.

Advanced search in transcripts for more than one keyword:

I know that SQL is capable of great things given my data structure, but building out a intuitive web component to input complex search is beyond my skills. I will happily pay some to do it in the future, once the product is validated.

A word on deployment

I went with my usual playbook for hosting backend apps, keeping it as simple as possible:

A DigitalOcean droplet running my preferred web server Caddy and the Django app I built in a virtual environment, managed by supervisord.

The database is staying SQlite for now, since the read performance is more than enough for at least 10-20 podcasts with moderate traffic.

This stack may seem overly simplistic and maybe even prone to failure, but I find it is enough to validate the product and can easily be made better in the future by introducing multiple hosts and migrating the database to a managed offering.

Next up: Validation phase

Having launched a workable MVP I will now continue with validating the product.

Since I haven’t done this before this will surely be a long path involving lots of learning. Right now I’m thinking of giving myself about 6 months to learn and execute on this, but that may change once I improve in this area.

Right now my playbook looks something like this:

  • Identify Podcasts with a tech forward mindest that could benefit from searchable transcripts
  • Manually add demo sites for them with the latest 4-10 episodes
  • Do cold outreach through LinkedIn, email and other platforms where possible

The initial goal will be to get 2-3 podcasts using the platform, with the assumption being that additional podcasts will become easier to acquire once podcasters and listeners alike see it working with real podcasts.

  1. automatic speech recognition models, for example OpenAI’s Whisper ↩︎

See also