What do you do when you suddenly have a lot of free time?
I had Juneteenth, a flex day, and the weekend all lined up. That gave me enough time to do something useful, responsible, or healthy.
So I opened PowerShell.
The original idea was simple. I wanted PowerShell to sound off error messages.
That was it.
Nothing big. Nothing carefully planned. I just thought it would be funny if a script failed and the computer announced the error out loud instead of quietly throwing red text on the screen.
Something like:
“Backup failed.”
Or:
“The server is not responding.”
Or the classic:
“Command failed successfully.”
That was supposed to be the whole project.
A talking error notifier.
Small. Dumb. Funny.
Then the question changed.
If PowerShell could speak an error message, could it speak any message? If it could speak text, could it read a page? If it could read a page, could it read a PDF?
That is how the project moved from sounding off error messages to reading PDFs.
At some point, scanned PDFs entered the picture too, which meant OCR had to join the party. Then came page selection, whole-document search, search result reading, speech controls, and a menu.
So yes, this started as a joke.
It did not stay a joke for very long.
PowerShell on Ubuntu
I built this in PowerShell running on Ubuntu.
That sounds a little odd at first because people usually associate PowerShell with Windows. But modern PowerShell runs on Windows, Linux, and macOS [1]. That is what made this project interesting.
PowerShell handled the structure of the script: the menu, user input, functions, search logic, and decisions about what to do next.
Ubuntu provided the environment and the command-line tools.
The script works because PowerShell can call external utilities. It does not personally extract PDF text, count pages, perform OCR, or speak audio. It calls tools that already do those jobs.

Here are the dependencies I installed:
sudo apt update
sudo apt install poppler-utils ocrmypdf tesseract-ocr tesseract-ocr-eng speech-dispatcher
Each dependency has a specific job:
poppler-utils Provides PDF tools such as pdftotext and pdfinfo
ocrmypdf Creates searchable OCR PDFs
tesseract-ocr Performs the OCR work
tesseract-ocr-eng Adds English OCR support
speech-dispatcher Provides spd-say for text-to-speech
pdftotext converts PDF content to plain text [2]. pdfinfo prints PDF metadata, including page count [3]. OCRmyPDF adds a text layer to scanned PDFs so they can be searched or copied, and it uses Tesseract for OCR [4]. spd-say sends text-to-speech requests to Speech Dispatcher [5].
So the relationship is simple:
PowerShell Controls the script
Ubuntu Provides the environment
Linux tools Handle PDF extraction, OCR, and speech
That is one of the things I like about scripting. You do not always need to build every piece yourself. Sometimes the real work is knowing which tools already exist and making them work together.
Checking for Required Tools
The first practical thing the script does is check whether the required commands are available.

Instead of waiting for the script to break halfway through, it checks up front:

This is not the exciting part, but it matters. If one of those tools is missing, the script tells me what is missing and how to install it.
That is better than getting a vague error later and pretending I enjoy troubleshooting my own weekend project.
Finding PDFs Automatically
I did not want to type the full path to a PDF every time.
So if I do not provide a file path, the script checks the folder where it is running and lists the PDFs there.

Then it looks for PDF files:

That means I can run:

and the script can show me a numbered list of PDFs in the same folder.
Less typing. Less friction. Much better.
Pulling Text from a PDF
Once a PDF is selected, the script needs to extract text from it.
That is where pdftotext comes in.

The important line is this:

The -layout option tries to keep the text layout readable, while -f and -l set the first and last pages to convert [2].
The script extracts the text into a temporary file, reads it into PowerShell, and deletes the temporary file afterward.
PowerShell stays in control, but the specialized PDF work is handled by a dedicated tool.
Getting the Page Count
For whole-document search, the script needs to know how many pages are in the PDF.
For that, it uses pdfinfo.

pdfinfo returns metadata about the PDF, including a line like this:
Pages: 472
The script finds that line, pulls out the number, and uses it as the last page.
That lets the search function search from page 1 through the end of the book:
Search-PdfText `
-PdfFile $PdfFile `
-SearchText $SearchText `
-StartPage 1 `
-EndPage $totalPages `
-ReadMatches $ReadMatches
That made the search feature cleaner.
Search should search the whole file unless I tell it otherwise.
Adding OCR for Scanned PDFs
PDFs are not all the same.
Some PDFs contain real text. You can highlight the words, copy them, search them, and extract them.
Other PDFs are basically images. They look like text to a person, but to the computer, they are just pictures on a page.
That is where OCR comes in.
For scanned PDFs, I used OCRmyPDF.


This creates a searchable version of the PDF.
For example:
network-book.pdf
becomes:
network-book.searchable.pdf
OCRmyPDF adds an OCR text layer to scanned PDFs, which allows them to be searched or copied [4]. Tesseract performs the OCR work behind the scenes [6].
The OCR options help clean up the result:
--skip-text Skips pages that already have text
--deskew Straightens crooked scans
--rotate-pages Fixes rotated pages when possible
-l eng Uses English OCR
This was a big turning point.
Before OCR, the script worked best with PDFs that already had selectable text. After OCR, it could work with scanned books too.
That changed the project from a funny talking script into something that could actually help with old manuals, books, and technical documents.
Searching the PDF
Once the script can extract text, searching becomes possible.
The search function goes page by page, splits the extracted text into lines, and checks each line for the search term.
This line handles the match:

That makes the search case-insensitive.
So searching for:
tcp
will still match:
TCP
Tcp
tCp
When a match is found, the script stores it as an object:

That object keeps track of the useful details:
Match number
Page number
Line number
Matching line
Nearby context
That matters because the script is not just printing search results and moving on. It remembers where the results came from.
That makes the next part possible.
Reading Search Results
After a search, the script lets me choose what I want to hear.
The options look like this:
1 Read matching line 1
c1 Read match 1 with context
p1 Read the page where match 1 was found
a Read all matching lines
Enter Return to the menu
If I type:
p1
the script reads the full page where match number 1 was found.
This part handles that:

The context option works the same way:

That was the point where the project crossed over.
It was no longer just making the computer talk.
It was finding information and reading it back.
Making It Speak
The speech part uses spd-say, which sends text-to-speech requests to Speech Dispatcher [5]. Speech Dispatcher provides a common interface for speech synthesis [7].
Before sending text to the speech engine, the script cleans it up and breaks it into smaller chunks.

Then each chunk gets sent to spd-say.

The -w option tells spd-say to wait until speaking is finished [5].
That worked, but it created a new problem.
If the computer starts reading something long, I need a way to stop it.
Adding a Stop Key
So I added a way to stop speech while it is running.
The script checks for S, Q, or Esc.

While spd-say is speaking, PowerShell keeps checking for those keys.

That one feature made the script feel much safer to use.
A tool that can read a whole page out loud also needs to know when to be quiet.
Keeping Search Results Readable
Search results can get long, especially in technical books.
So I added paged output.

When the output reaches the bottom of the terminal, the script pauses.

That keeps the results from flying past the screen.
It is a small thing, but small things like that make scripts easier to use.
The Menu
Once the script had enough features, it needed a menu.

The main loop keeps everything running.
$running = $true
while ($running) {
Show-Menu -CurrentPdf $currentPdf
$choice = Read-Host "Choose an option"
switch ($choice) {
"5" {
$searchText = Read-Host "Enter word or phrase to search"
Search-WholePdf `
-PdfFile $currentPdf `
-SearchText $searchText `
-ReadMatches $false
Read-Host "Press Enter to continue"
}
"0" {
spd-say --cancel
Write-Host "Goodbye." -ForegroundColor Cyan
$running = $false
}
}
}
At that point, the tiny error-message idea had become a menu-driven PDF reader with OCR, search, and speech.
That was not the plan.
That was just where the next question kept leading.
Will It Work on Windows?
Yes, but not exactly as written.
The PowerShell parts are mostly portable because PowerShell runs on Windows, Linux, and macOS [1]. The menu, functions, loops, objects, and text-processing logic should work anywhere PowerShell runs.
The operating system-specific parts are the external commands.
This version was built for Ubuntu, so it uses:
pdftotext
pdfinfo
ocrmypdf
spd-say
On Windows, PowerShell itself is not the problem. The issue is replacing or installing the supporting tools.
A Windows version would need to handle a few things differently:
PDF text extraction Install Poppler for Windows or use another PDF library
OCR Install Tesseract and OCRmyPDF, or use WSL/Docker
Text-to-speech Use Windows speech instead of spd-say
File paths Handle paths like C:\Users\...
Opening files Use Invoke-Item instead of xdg-open
The biggest change would probably be speech.
On Ubuntu, this script uses:
& spd-say "Backup failed."
On Windows, I would use the Windows speech engine instead. Microsoft’s System.Speech.Synthesis.SpeechSynthesizer provides access to installed text-to-speech voices on the host computer [8].
A Windows version might use something like this:
Add-Type -AssemblyName System.Speech
$speaker = New-Object System.Speech.Synthesis.SpeechSynthesizer
$speaker.Speak("Backup failed.")
So yes, the idea works on Windows, but the speech function would need a Windows version.
What About macOS or Other Linux Distros?
The same idea could work on macOS too, but the setup would change.
macOS has its own say command for text-to-speech [9].
So instead of using spd-say, a macOS version could call:
say "Backup failed."
The PDF and OCR tools could likely be installed with Homebrew. Homebrew provides formulas for tools like OCRmyPDF and Tesseract [10].
Other Linux distributions should also work, but the install command would change.
Ubuntu uses:
sudo apt install poppler-utils ocrmypdf tesseract-ocr tesseract-ocr-eng speech-dispatcher
Fedora, Arch, Debian, and other distributions may package the tools differently.
The practical breakdown is this:
PowerShell logic Mostly portable
PDF extraction Portable if Poppler tools are installed
OCR Portable if OCRmyPDF and Tesseract are installed
Speech Depends on the operating system
File opening/paths Depends on the operating system
If I wanted to make the script truly cross-platform, I would probably add operating system detection.
Something like:
if ($IsLinux) {
$speechCommand = "spd-say"
}
elseif ($IsMacOS) {
$speechCommand = "say"
}
elseif ($IsWindows) {
$speechCommand = "windows-speech"
}
Then the script could choose the right speech method depending on where it is running.
But for this version, I kept it focused on Ubuntu.
That was enough for one weekend.
Another Project for Another Weekend
Naturally, once the script could read PDFs, another idea showed up.
What about EPUBs?
That is where I had to stop.
EPUB support is absolutely possible. EPUB files are basically packaged HTML, so the script could eventually unpack them, find the chapters, convert the text, search through it, and read it out loud.
But that is not this weekend’s problem.
That is another project for another weekend.
This one already went from “make PowerShell sound off error messages” to “build a PDF OCR search-and-speech reader.”
That is enough escalation for now.
EPUBs can wait.
Probably.
Closing
Not every weekend project needs to become something polished, packaged, or practical.
Sometimes it is enough to follow the idea, break a few things, fix a few things, and end up with a script that does more than you expected when you started.
This one began with PowerShell talking back.
Somehow, it ended with PDFs talking back.
That feels about right.
Sources
[1] Microsoft Learn — Install PowerShell on Windows, Linux, and macOS
[2] Debian Manpages — pdftotext(1) — poppler-utils
[3] Debian Manpages — pdfinfo(1) — poppler-utils
[4] OCRmyPDF Documentation / Project — OCRmyPDF: Adds an OCR text layer to scanned PDF files
[5] Mankier Manpage — spd-say(1) — send text-to-speech request to Speech Dispatcher
[6] Tesseract OCR — Tesseract User Manual
[7] Speech Dispatcher Project — Speech Dispatcher GitHub Repository
[8] Microsoft Learn — System.Speech.Synthesis.SpeechSynthesizer Class
[9] SS64 — say command for macOS text-to-speech
[10] Homebrew Formulae — OCRmyPDF formula
[11] Homebrew Formulae — Tesseract formula