OSM Town Compass @compass

0 posts0 participants0 posts today

Replied to Nielso

**Elias Probst** @eliasp@mastodon.social · Jul 3, 2024

Jul 3, 2024

Elias Probst @eliasp@mastodon.social

@nielso darf ich dir von unserem Herrn & Erlöser #PaperlessNGX (welcher auch die Wunder des #OCRmyPDF zum Nutzen seiner Jünger mehret) predigen?

**Nielso** @nielso@digitalcourage.social · Jul 3, 2024

Jul 3, 2024

Nielso @nielso@digitalcourage.social

Die kleinen Freuden der freien Welt: Ein Script hacken, das auf dem Samba-Serverchen (Fujitsu Thin Client) dort vom Brother Büromonster abgelegten Scans annimmt und durch #ocrmypdf laufen lässt und das Ergebnis ebenfalls auf dem Samba-Share ablädt.

Replied in thread

**Peter Vágner** @pvagner@fedi.ml · Mar 17, 2024

Mar 17, 2024

Peter Vágner @pvagner@fedi.ml

@meatbag I'm on linux and the best I have found working for me is #ocrmypdf github.com/ocrmypdf/OCRmyPDF
It uses #tesseract under the hood and for static text it's okay. For tables and other material that is difficult to parse it's not usefull.
When PDF has a text then the tools I am using for reading these include #firefox and #evince

GitHubGitHub - ocrmypdf/OCRmyPDF: OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searchedOCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched - ocrmypdf/OCRmyPDF

#evince

Replied in thread

**Elias Probst** @eliasp@mastodon.social · Feb 25, 2024

Feb 25, 2024

Elias Probst @eliasp@mastodon.social

@lauren for going down this route, you might want to give #OCRmyPDF a try:
https://ocrmypdf.readthedocs.io/en/latest/cookbook.html#produce-pdf-and-text-file-containing-ocr-text

@alastair @jamesbritt

ocrmypdf.readthedocs.ioCookbook — ocrmypdf 16.0.5.dev4+g2005f62 documentation

**Albert Cardona** @albertcardona@mathstodon.xyz · Dec 12, 2023 *

Dec 12, 2023 *

Albert Cardona @albertcardona@mathstodon.xyz

When you find a webpage that offers you a book but you can't download it, and you can't right-click to save the images of its pages, well – the page has loaded the images. Therefore the images are somewhere in your browser. What to do?

Knowing a bit of how web pages are structured and built helps make the most of what you see online.

1. In your browser, open the developer tools (push F12).

2. Go to the "Network" tab and restrict the view to "Images" and "Media" (see the upper right side).

3. Zoom into the book to ensure pages are of high resolution, then pass the pages.

4. You will notice new rows appearing into the table of the "Network" tab of the Developer Tools.

5. Now move your mouse over them and the image may even be shown to you; in any case just right-click and save it.

There are scripts online to automate this, but if all you are after are a few pages, this suffices.

To montage the pages into a PDF, use e.g.:

$ img2pdf *jpg -o book.pdf

... and even OCR them if you like:

$ ocrmypdf book.pdf book-OCR.pdf

Both programs can be installed with:

$ sudo apt get install img2pdf ocrmypdf

... in ubuntu, debian, and the like.

Or, import each into a page of a multi-page #Inkscape document and save it as a PDF.

#img2pdf #ocrmypdf

Replied in thread

**Frederik Elwert** @felwert@mstdn.social · Nov 30, 2023

Nov 30, 2023

Frederik Elwert @felwert@mstdn.social

@KlusikEckert Ich finde ja #OCRMyPDF genial, aber als Kommandozeilenprogramm vielleicht nicht ganz so einsteigerfreundlich.

**zrzz** @zrzz@mastodon.social · Nov 18, 2023

Nov 18, 2023

zrzz @zrzz@mastodon.social

I am a rulebook hoarder. Whenever I take a closer look at a game downloading the rulebook is the first thing I do. I have over 2500 boardgame related pdf files. I access them using pdf-tools in #Emacs, index them using #recoll and I use a small hack to make M-x pdfgrep search using the recoll index. I use the #OCRmyPDF tool to OCR the ones that didn't come with embedded text.
#boardgames

Recent searches

Search options

Administered by:

Server stats:

#OCRmyPDF