ctrlQuery logo ctrlQuery

How to Search a PDF Semantically in Chrome (Without Uploading It Anywhere)

Published May 27, 2026

The 60-second answer

If you got here by searching "how to search a PDF semantically in Chrome," here's the fastest path to the answer. The rest of this post is for people who want to understand what's happening underneath; if you just need the steps, you can stop after this section.

  1. Install ctrlQuery from the Chrome Web Store.
  2. Open your PDF in Chrome. Drag the file into a tab, or open one that's already hosted online. It loads in Chrome's built-in PDF viewer.
  3. Click the ctrlQuery toolbar icon. You'll see a "PDF detected" banner with an Open Viewer button — click it. ctrlQuery's custom PDF viewer opens the same document in a new tab.
  4. Click the toolbar icon again on the new tab. Toggle Smart Search (the AI mode). The first time you do this on any machine, a small embedding model (roughly 30MB) downloads to your browser and caches locally. After that, the toggle is instant.
  5. Type a question or concept in plain language. Not keywords, a question. "How does this policy handle stolen tokens" is the right kind of query. "stolen" is not.

That's the whole flow. The PDF stays on your machine. No upload, no account, no API call to OpenAI or anyone else. The model runs in your browser tab and ranks the passages of the PDF by how close their meaning is to your query. Closest matches get highlighted in order.

If your PDF is scanned (an image, not real text), this won't work without OCR first. If it's a local file (a file:///... URL), you'll need to grant ctrlQuery file URL access once — both covered in the FAQ.

Install ctrlQuery

Why Ctrl+F fails on PDFs

Chrome's native PDF search is one of the weakest find functions in any major piece of software, and it has been that way for years. Three reasons it breaks down on serious PDFs.

It only matches literal characters. The author of the document chose specific words, and if your mental model uses different ones, Ctrl+F can't bridge the gap. A NIST publication uses "memorized secret" where most readers would search "password." A medical paper uses "intracranial hemorrhage" where you'd type "brain bleed." A legal contract uses "indemnification" where you'd type "who pays if something goes wrong." Ctrl+F treats these as unrelated.

Multi-column layouts confuse it. Academic papers and government reports often have two-column text, sidebars, and footnotes. The PDF text layer doesn't always preserve reading order. Searching for a phrase that wraps across columns sometimes returns nothing, even when the phrase is plainly visible on the page.

Plenty of PDFs have broken text layers. Older scans, exported slide decks, and documents created by converting from Word with unusual fonts can have text that displays correctly but indexes as gibberish or with bad character encoding. Ctrl+F returns zero matches on documents that very clearly contain the word you searched for.

For a short PDF you wrote yourself, none of this matters. For a 60-page NIST publication, a 200-page case file, or an academic paper full of jargon, Ctrl+F is the wrong tool. You end up scrolling and skimming, which defeats the purpose of having a search function at all. (Full Ctrl+F comparison: Ctrl+F vs ctrlQuery.)

What semantic search actually does

The non-jargon version:

Instead of looking for the exact characters you typed, semantic search reads the meaning of each paragraph in the PDF and ranks them by how close that meaning is to what you asked.

The way it does this is through embeddings. An embedding is a list of numbers that represents the meaning of a piece of text. Sentences with similar meanings get similar lists of numbers, even when the words are totally different. "How do I cancel my subscription" and "terminate your account" produce embedding lists that sit close to each other in the math. ctrlQuery uses a model called bge-small-en-v1.5, which produces 384 numbers per chunk of text, and runs entirely inside your browser tab via transformers.js.

When you type a query, the model produces an embedding for that query too. A hybrid index then ranks every paragraph in the PDF by two signals: vector similarity (how close the embeddings are) and a traditional keyword score for exact-word matches. The top results get highlighted in the document.

A few practical implications worth understanding.

It works on phrasing the document doesn't use. If the PDF talks about "vendor risk management" and you search "what if a supplier gets hacked," semantic search can still find the right section. Ctrl+F never will.

It returns ranked results, not just hits. You see the best-matching passages first. On a 60-page PDF, that's the difference between reading one paragraph and skimming twenty.

It still understands literal matches. If the exact word you typed appears in the document, that passage gets boosted too. You don't lose Ctrl+F-style behavior; you gain a layer on top of it.

It doesn't generate text. The model finds passages, not summaries. The answer is whatever the PDF already says. That matters if you don't want an AI inventing things about your contract or research paper.

A walkthrough on a real PDF: the Wikipedia World War I article

To make this concrete, take a long PDF most people have never tried to search semantically: Wikipedia's article on World War I. It's freely available, easy to replicate (open the article, use Chrome's "Print → Save as PDF"), and runs about 60 pages — long enough that scrolling through Ctrl+F matches stops being a strategy.

Open https://en.wikipedia.org/wiki/World_War_I, save it as a PDF, and open the file in ctrlQuery's viewer. Three queries on the same document.

Query 1: "What event triggered the start of the war?"

Ctrl+F for triggered returns nothing useful — the word doesn't appear in any heading and barely appears in the body. Searching for start returns dozens of scattered matches in unrelated paragraphs, none of which are the section that explains what actually set off the war.

Smart Search ranks the "Background" section at the top, opening with:

The causes of World War I included the rise of the German Empire and decline of the Ottoman Empire, which disturbed the long-standing balance of power in Europe, the exacerbation of imperial rivalries, and an arms race between the great powers.

That's the section that walks through what actually caused the war, framed exactly as the question asks.

Query 2: "What happened to Germany after they lost?"

Ctrl+F for Germany returns hundreds of literal matches across diplomatic history, troop movements, casualty counts, and treaty negotiations. None of them are the section you actually want, and there's no faster way to find it than reading through all of them.

Smart Search ranks the "Aftermath" section at the top:

In the aftermath of the war, the German, Austro-Hungarian, Ottoman, and Russian empires disappeared. Numerous nations regained their former independence, and new ones were created. Four dynasties fell as a result of the war...

Direct, on-topic, exactly the section a colleague would point you to.

Query 3: "Why did the United States join the war?"

Ctrl+F for United States returns dozens of mentions scattered across diplomatic discussion, force counts, casualty figures, and post-war negotiations. None of them directly answer the question of why the US actually entered.

Smart Search surfaces the paragraph that addresses it:

Despite his conviction that Germany must be defeated, Wilson went to war to ensure the US played a leading role in shaping the peace, which meant preserving the AEF as a separate military force, rather than being absorbed into British or French units.

The actual answer, found by meaning instead of keyword.

The pattern across all three queries is the same: you ask the question the way you'd ask a colleague, and ctrlQuery finds the section that answers it, regardless of whether the author used your words.

Privacy: why this matters specifically for PDFs

Of all the documents people open in a browser, PDFs are the most privacy-sensitive on average. The format gets used for contracts, signed agreements, medical records, internal reports, M&A diligence, legal filings, and research bound by confidentiality requirements. "Drag this PDF into a chatbox and ask it questions" is convenient, but it also means the document gets posted to a third party's server, processed by their model, and retained in whatever logs they keep.

Most AI PDF tools work exactly that way. You upload the PDF, their backend extracts the text, embeds it on their server, and runs queries against their hosted index. Sometimes the upload is explicit. Sometimes it's hidden behind a Chrome extension that quietly POSTs the page content to an API in the background. (For a broader audit of which competitors do this, see Best AI Ctrl+F Alternatives in 2026.)

ctrlQuery doesn't do that. The embedding model runs inside your browser tab. The PDF's text gets chunked and indexed in memory locally. Queries match against that local index. Nothing about the file is uploaded, no API call is made with its content, and there's no server-side log of what you searched for.

If you're a lawyer searching a deal document, a doctor reviewing a clinical paper that mentions patient data, or anyone working under an NDA, this isn't a small detail. It's the whole reason to choose one tool over another.

FAQ

Does it work on scanned PDFs?

No, not directly. Semantic search needs a text layer. If your PDF is an image (a scanned document, a photographed page, a faxed contract), there's nothing for ctrlQuery to embed. Run it through OCR first — Tesseract is free and cross-platform, Apple Preview handles it on Mac, and Adobe Acrobat handles it everywhere if you have a license. Save the OCR'd version and reopen that in Chrome. After that, Smart Search works normally.

How big can the PDF be?

ctrlQuery caps PDFs at 200MB. Anything under that should index without issue. The first-time index on a long PDF (say, 300 pages) takes a few seconds while embeddings get computed; after that, queries are instant. The index lives in memory while the tab is open, so reopening the same PDF means a quick re-index, not a re-download.

Does it work offline?

Yes, after the first install. The embedding model (about 30MB) is downloaded once on your first Smart Search and cached in your browser. After that, you can be on a plane with no Wi-Fi and Smart Search still works. The PDF, the model, and the index all live on your machine.

Does Chrome's PDF viewer matter?

ctrlQuery uses its own PDF viewer built on PDF.js, so it works whether or not Chrome's built-in PDF viewer is enabled. If you've replaced Chrome's viewer with Acrobat or another handler, you can still use ctrlQuery by opening the PDF in a ctrlQuery tab directly.

What about local PDFs on my computer?

Local PDFs (URLs starting with file:///) work, but Chrome blocks all extensions from reading local files by default. The first time you try to open one in ctrlQuery's viewer, you'll see a panel walking you through the fix: open chrome://extensions, find ctrlQuery, click Details, and toggle on "Allow access to file URLs." After that one-time toggle, local PDFs work the same as any other.

What about PDFs with weird tables or columns?

The text layer matters more than the visual layout. ctrlQuery extracts text in reading order via PDF.js, which handles most multi-column documents well. Heavily designed PDFs (magazine layouts, infographics) can produce out-of-order chunks, which slightly hurts precision but doesn't break the search.

Does it work across multiple PDFs at once?

Not currently. Each PDF is indexed in the tab it's open in. Cross-document semantic search is a different product (closer to a personal knowledge base) and isn't part of ctrlQuery's scope. The goal here is a smarter Ctrl+F for the document in front of you, not a research workspace.

Try it on your next PDF

If you've ever spent twenty minutes scrolling a long PDF because Ctrl+F kept missing the section you knew was in there, ctrlQuery is built for that specific frustration. Install it, open the most search-resistant PDF on your machine, toggle Smart Search, and ask the question the way you'd actually ask it.

Install ctrlQuery


Related