Episode 8 · Tuesday, 28 April 2026

Under the Hood

Technical metadata and structured transcripts transform invisible audio into searchable assets that build long-term domain authority and capture specific user queries.

Schema.org CNAME records Search Console Transcript indexing Domain authority PodHerd

By How to Get Discovered | 13m listen | 9 chapters

RSS Transcript

Under the Hood cover — How to Get Discovered · No. 8

About this episode

Google and Bing interpret podcast landing pages as structured documents rather than visual layouts, making technical legibility a requirement for modern discoverability. Maya and Tom establish a strict vocabulary rule for this technical deep dive, requiring every three-letter acronym to be defined for the audience. The episode centers on how Schema.org markup and invisible metadata dictate whether a search engine identifies a page as a specialized podcast episode or a generic blog post.

Effective search indexing relies on a three-step process of discovery, crawling, and ranking. Tom explains that machine-generated transcripts often fail as a wall of text, whereas breaking content into addressable moments with timestamps creates unique URLs for specific topics like freelance rate negotiation. Using a custom domain via CNAME records ensures that search authority and compound interest accrue to the creator rather than a hosting provider. Tom reveals his own PodHerd experiment, where he is currently testing these discoverability theories on a starter tier domain to validate Maya's claims about long-term search performance.

Google Search Console remains the essential free tool for tracking query data and impressions, though it remains inaccessible to creators stuck on platform-hosted subdomains. Tom admits to his own technical shortcuts in the PodHerd setup, while Maya holds the line on technical clarity. The episode concludes with a look toward data compounding and the long-term benefits of owning the digital phonebook entry for a podcast brand.

CHAPTER 01 / 9 Discussion

Technical Vocabulary Rules for Podcast Discoverability Discussion

Maya and Tom introduce a technical episode of How to Get Discovered focused on transcript indexing and search engine behavior. Maya establishes a ground rule requiring Tom to define every technical term or three-letter acronym used, such as structured data, to ensure the conversation remains accessible to listeners. Tom hints at a personal admission regarding his own podcast setup to be revealed later in the episode.

technical terms· acronyms· structured data· podcast indexing· definitions

00:00 Welcome back to How to Get Discovered. I'm Maya And I'm Tom! HTGD is the show where we argue about how podcasts get found Last week was the synthesis episode and Tom said it was the most useful conversation he'd had, which I have transcribed for my records Don't! I've transcribed it Today's episode is a technical one Under The Hood – how transcript indexing actually works Why search engines treat a transcript page differently from show notes page. What makes a transcript good for search? And what makes a transcript a wall of useless text? I want to set a rule for this episode Set the rule The rule is, every time you use a technical word that's not in normal English... ...I get to make you explain it

00:48 No three-letter acronyms without a translation. No phrases like structured data without a definition If a listener has to look something up to follow the conversation, we've failed That's a fair rule It's a fair rule I'll try...I may slip Tom will catch me I will catch you And this is gonna come up later I have something to admit at some point in this episode Oh good I'm setting it up early so you can be patient I'll be patient Let's get into it. Okay, I want to start with the question that is actually the technical heart of the season which is why is a transcript on properly structured page different from the same transcript pasted into regular show notes page? Because they're not the same to search engine Right but why not To human reader they look identical Same words, same content

CHAPTER 02 / 9 Discussion

Search Engine Perception of Structured Podcast Pages

Search engines like Google and Bing interpret webpages as structured documents rather than visual layouts. While humans see headings and players, search engines read invisible metadata and tags that define whether a page is a recipe, product, or podcast episode. Properly identifying a page as a podcast episode allows search engines to surface specific moments in carousels and specialized search panels.

search engines· metadata· podcast episode page· layout· indexing

01:47 Maybe the same length. If you scroll past both, they look like text with a podcast player on top To a human reader? Sure! To a search engine what's different? To a search engine almost everything And the difference isn't visible from the outside It's in the underlying structure of the page The bits that the search engine reads that the human doesn't see And the question is whether those invisible bits actually matter They do. A lot! And I'm going to spend the next half hour explaining why, in terms you can actually use. So the first thing to understand is that when a search engine looks at a webpage—Google, Bing, the AI Search ones… all of them—it doesn't see the page the way YOU see it. You see a layout —you see headings and paragraphs and a player...and maybe a sidebar with links to other episodes

02:40 The search engine sees, at root, a document. A document with a structure. Some bits are titles. Some bits are headings. Some bits are body text. Some bits are metadata about the page itself. Some bits are signals about what the page is for. Stop! Define metadata? Right. Metadata is data about data—in this case, data about the page. The page has a title. The page has a description. The page has tags that say what kind of page it is—is it an article? Is it a podcast episode? Is it a recipe? Is it a product? Those tags are metadata and they're written in the page in a way that the human visitor doesn't see but the search engine does

03:28 Okay, so a page can tell a search engine I am specifically a podcast episode page rather than just I am some page. Exactly. And that turns out to matter a lot, because the search engine can then treat it as a podcast episode page — it can show it in different ways and search results… It can pull out the transcript and surface-specific moments...It can show the episode in podcast specific carousels and panels— none of which it can do if it just sees a generic page. This is the structured data thing you promised you'd say for this episode? This is the structured data thing.

04:06 Should I define structured data? Yes, please. Structured data is the bits of the page that follow a specific format that search engines have agreed on There's a standard called schema dot org which is basically a vocabulary You can mark up your page to say this is a podcast episode Here's its title here's its description here's it's duration here's its host here's the show it belongs to here's the transcript All in a format the search engine recognizes. So instead of guessing what the page is, it knows! And this is something you add to the page? Something you or more realistically, the platform that publishes the page on your behalf adds. Most show notes pages don't have it. Most transcript pages — the kind I keep banging on about — do. Why don't show notes pages have it?

CHAPTER 03 / 9 Discussion

Schema.org and Structured Data for Podcast Hosting

Structured data follows the schema.org vocabulary, allowing platforms to explicitly label titles, descriptions, durations, and transcripts for search engines. Many podcast hosting platforms fail to prioritize this markup, causing Google to index show notes as generic blog posts with audio rather than specialized podcast content. This technical legibility is increasingly vital for AI systems building topical databases of existing podcasts.

schema.org· structured data· hosting platforms· blog posts· legibility

04:59 Mostly because they're generated by hosting platforms that didn't prioritize it. The page exists to host an embedded player and a bit of text— nobody bothered to mark it up as a specific kind of page with specific properties, so Google sees a page with a player and some text…and doesn't know what it is! It indexes it like a blog post that happens to have audio on it. Okay. And the version of the page that has the markup gets treated differently? Gets treated very differently, shows up differently in search results is more likely to be surfaced for specific queries is — and this is the bit that increasingly matters — more readable to AI systems that are building their understanding of what podcasts exist on what topics So it's not about ranking higher exactly It's about being legible

CHAPTER 04 / 9 Discussion

Transcript Structure and Addressable Moments via Timestamps

A raw wall of text from a machine transcription service lacks the headings and paragraph breaks necessary for effective search indexing. By breaking a transcript into sections with specific headings and timestamps, each segment receives its own unique URL or "addressable moment." This allows a user searching for a specific topic, such as freelance rate negotiation, to land directly on the relevant audio segment rather than a generic homepage.

timestamps· rate negotiation· url· indexing· transcript quality

05:50 It's about being legible. That's a really good way to put it! The unstructured page is legible to a human and illegible to a machine, the structured page is legible to both. Now the second thing... A transcript on a page is not the same as a transcript that works and the difference is mostly about structure. Walk me through it Imagine you've got an hour-long podcast. You run it through a machine transcription service, and you get back a text file. The text file is… let's be generous... about 10 thousand words. Maybe 12. It's mostly accurate—the speakers are roughly attributed. Sounds reasonable! Now imagine you paste that text file onto a web page under an embedded player

06:35 What does a search engine see? 10,000 words of text. Right! But more specifically... 10,000 words of text with no structure—no headings, no paragraphs in any meaningful sense… probably no timestamps… probably no clear topic breaks…. just a wall Okay. To a search engine, that wall is — and I'm going to oversimplify here but this is roughly true — that wall is one big undifferentiated noun. It's a page that's about something but the search engine can only get the broadest sense of what. It might pick up some keywords it might notice that freelance appears a lot But it can't tell which paragraph is about rate negotiation and which paragraph is about chasing invoices And the structured version

07:24 The structured version breaks the transcript into actual sections, with actual headings, with actual paragraph breaks and crucially with timestamps that link to specific moments. So the section about rate negotiation has a heading that says Rate Negotiation. This section has time stamps. This section can be linked to directly—there's a URL that takes you to that specific moment. Stop! Why does that last bit matter? Because the search engine doesn't just want to index, this page is about freelance topics. It wants to index this specific moment in this episode is the answer to this specific question which it can only do if the moment has its own address—its own URL So instead of one page about freelancing, the same episode becomes what dozens of addressable moments

08:19 That's exactly right. The episode becomes, let's say 15 or 20 distinct moments—each with its own URL—each indexable independently… each able to surface for its own query. So when somebody Googles how do I negotiate a freelance rate? They don't land on the show's homepage... They don't even land on the episode page. They land on a specific moment in the specific episode where you talk about rate negotiation, and the page plays starting from that moment! That's a different shape of result… It's a completely different shape of result... And it is only possible because the transcript is structured – The wall-of-text version of the same content can't do any of this

CHAPTER 05 / 9 Discussion

Mechanics of Web Crawling and Search Indexing

The process of appearing in search results involves three distinct steps: discovery, crawling, and ranking. Google uses programs to visit pages and feed them into a giant database index based on sitemaps and domain history. Final ranking depends on a combination of metadata, transcript quality, page speed, and external links, most of which remain invisible to the end listener.

google crawl· sitemaps· domain authority· search results· indexing process

09:04 This is the bit where I think I actually want to ask the question that came in to Ask. Go! How does indexing actually happen? Like, mechanically... The transcript exists, the structure exists, the timestamps exist… What's actually happening between that page existing on the open web and that page showing up in a search result? Right. So Google — and others — crawl the web They have programs that visit pages, read them and feed them back into an index. The index is basically a giant database that knows which pages contain which words and which topics. When somebody searches Google looks in the index and ranks the results

09:49 For a transcript page to show up in a search result, three things have to happen. One, Google has to know the page exists Two, Google has to have crawled it and indexed it Three, when somebody searches, Google has to decide that this page is one of the better answers to that search And each of those steps is its own problem. Each of those steps is it's own problem. Step one is easier than it used to be—you submit a sitemap, which is a list of all your pages, to Google directly. Step two depends on Google deciding the page is worth crawling…which depends on whether the domain has been useful in the past...which goes back to the Domain Authority stuff

10:34 Step 3 is where everything we've talked about all season actually matters. The structure, the metadata, the transcript quality, the page speed, the links pointing at the page—all of it! And the listener doesn't see any of this? The listener doesn't see any of this. The listener types a question, gets a result, taps the result, lands on a podcast... They have no idea any of this happened. Okay, now I want to bring it back to the domain conversation. Because we did the philosophical version of this in episode 2 – Whose house are you building? And now I want to do the technical version. Why does the same transcript on the same machine hosted on two different domains perform differently in search? Because the domains have different histories

CHAPTER 06 / 9 Discussion

Domain Authority and Subdomain Hosting Risks

Hosting transcripts on a platform's default subdomain prevents a podcast from building its own long-term search credibility. Established domains with years of history and engagement signals pass "authority" to new pages, allowing them to rank faster and higher. Podcasters using their own custom domains ensure that search performance and compound interest accrue to their own brand rather than the hosting service.

domain authority· subdomains· hosting platforms· credibility· search performance

11:25 Different histories. Different what we called authority in episode 2, and the way that translates technically is Google has been crawling some domains for 15 years It has data on every page they've ever published Some of those pages have been linked to from other reputable places Some of them have been read by humans for a long time Engagement signals All of that builds up And when a new page goes up on that domain, it inherits the credibility. So…a new page on a high history domain ranks faster than the same page on a new domain? Faster and higher! This is the bit that's relevant for the podcaster—when your transcripts are hosted on a hosting platform's default subdomain you don't get any of that. The hosting platform has a domain. The platform's domain has authority

12:18 But that authority is spread across all the podcasts they host. Your specific show on its specific subdomain is a tenant, not a property owner. Right! Whereas if the transcripts live on your own domain—yourshowname dot com slash episode slash whatever—every page that goes up there is building authority for your domain, not somebody else's. Years from now, your domain has its own history. Its own credibility. Its own search performance. This is the CNAME conversation again… From episode 2! But I want to do the technical version now because the question I left hanging in episode two was how is the page actually served?

13:04 Mechanically, how does the listener type archive.myshowname.com slash episode slash whatever and end up on a page hosted by somebody else? Walk me through it When somebody types a URL into their browser the first thing that happens is a DNS lookup DNS – Domain Name System – is roughly the phonebook of the internet It translates archive dot myshowname dot com into the address of a server A CNAME record in your DNS is a kind of redirect. It says, when somebody looks up archive dot my show name dot com send them to a different address the address of the service hosting your transcripts So the URL the listener sees is yours The actual server they hit is somebody else's Exactly And here's the important bit The search engine doesn't really care about the server it cares about the URL

CHAPTER 07 / 9 Discussion

DNS Lookups and CNAME Records for Custom Domains

The Domain Name System (DNS) acts as the internet's phonebook, translating URLs into server addresses. A CNAME record allows a podcaster to use their own URL while the actual content is served by a third-party hosting provider. Because search engines index content against the URL rather than the physical server, the podcaster retains all search authority and avoids "renting" space on another entity's domain.

dns lookup· cname record· redirect· server· url

14:01 The URL is yours, so the page indexes against your domain. The authority accrues to your domain. The compound interest over years is yours And without that? You're just renting! You're just renting and there's a related thing Once the transcripts are on your domain, you can connect the domain to Google Search Console We talked about this in episode 2 but I want to do the technical version now too Go Search Console is Google's free tool. It tells you, for any domain you control which queries are bringing people to your pages which pages are being clicked which pages are appearing in search results but not being clicked

CHAPTER 08 / 9 Discussion

Google Search Console Benefits for Discoverability

Google Search Console is a free tool that provides data on which queries bring visitors to a site and how pages perform in search results. This tool is only available for domains that a user can prove they control, making it inaccessible for those using platform-hosted subdomains. Access to these metrics allows creators to see which episodes are earning impressions and which titles need improvement.

google search console· queries· impressions· click-through rate· data

14:44 What your average position is for any given query. It is, without exaggeration, the most useful piece of free software a podcaster who cares about discoverability can use — and you can only use it for domains you can prove you control. Which means platform-hosted transcripts. You can't connect them to Search Console. The data exists—Google is generating it That's the bit that bothered me most in episode 2. I remember, it bothered you because it was the part where there was actual money on the table or actual data anyway. Right PodHerd on the higher tier has the search console integration set up so you can plug your domain in and they show you the metrics directly which is, I'll be honest, the thing that made the biggest difference for me

CHAPTER 09 / 9 Discussion

Tom's PodHerd Experiment and Show Outro

Tom admits to starting a three-month experiment by setting up a feed on the PodHerd starter tier to test the discoverability theories discussed throughout the season. While he is currently using the podherd.com domain rather than a custom CNAME, he intends to use the resulting data to validate Maya's claims about search performance. The hosts conclude the episode by previewing next week's discussion on data compounding.

podherd· starter tier· experiment· data· podcasting

15:36 Not because the metrics changed anything about what I made, but because for the first time I could see what was actually happening. Which episodes were earning impressions? Which queries were bringing people in? Which episodes were ranking but not being clicked which probably meant that title or the description wasn't doing its job. Okay This is the admission. Oh, I told you I'd have one! Tell me... I set up my own feed on PodHerd about three weeks ago Tom… I did the thing You did the-? I did the thing on the starter tier because i'm not yet ready to admit that I am convinced enough to use the CNAME version. I'm using their domain podherd dot com slash my show which im fine with for now tom

16:30 I'm not converted. I am experimenting You are converted! You're experimenting because you're convinced! I'm experimenting because I want to see if the things you've been saying for 7 episodes are actually true and the cheapest way to find out is to try it That is the most begrudging way anybody has ever told me they did a thing I'm a begrudging man How long are you going to give it? 3 months, three months and then I'll have data And then either I'll have to admit you were right, or I'll have ammunition for the rest of my life. Either way you'll have data. Either way I'll have data! This is genuinely a big moment for the show... Don't make a big deal of it!! I'm making a big deal of it. The whole season has been pointing at this. I'm sorry I'm peaking in episode 8… It's the right episode for it. Next week. Next week is the Data one. Compounding

17:34 You're delighted. Open. Open is a big move from where you started! Don't get used to it... Thanks for listening to How To Get Discovered, we'll see ya next week! See ya next week!

Clip Generation