@nielso darf ich dir von unserem Herrn & Erlöser #PaperlessNGX (welcher auch die Wunder des #OCRmyPDF zum Nutzen seiner Jünger mehret) predigen?
@nielso darf ich dir von unserem Herrn & Erlöser #PaperlessNGX (welcher auch die Wunder des #OCRmyPDF zum Nutzen seiner Jünger mehret) predigen?
Die kleinen Freuden der freien Welt: Ein Script hacken, das auf dem Samba-Serverchen (Fujitsu Thin Client) dort vom Brother Büromonster abgelegten Scans annimmt und durch #ocrmypdf laufen lässt und das Ergebnis ebenfalls auf dem Samba-Share ablädt.
@lauren for going down this route, you might want to give #OCRmyPDF a try:
https://ocrmypdf.readthedocs.io/en/latest/cookbook.html#produce-pdf-and-text-file-containing-ocr-text
When you find a webpage that offers you a book but you can't download it, and you can't right-click to save the images of its pages, well – the page has loaded the images. Therefore the images are somewhere in your browser. What to do?
Knowing a bit of how web pages are structured and built helps make the most of what you see online.
1. In your browser, open the developer tools (push F12).
2. Go to the "Network" tab and restrict the view to "Images" and "Media" (see the upper right side).
3. Zoom into the book to ensure pages are of high resolution, then pass the pages.
4. You will notice new rows appearing into the table of the "Network" tab of the Developer Tools.
5. Now move your mouse over them and the image may even be shown to you; in any case just right-click and save it.
There are scripts online to automate this, but if all you are after are a few pages, this suffices.
To montage the pages into a PDF, use e.g.:
$ img2pdf *jpg -o book.pdf
... and even OCR them if you like:
$ ocrmypdf book.pdf book-OCR.pdf
Both programs can be installed with:
$ sudo apt get install img2pdf ocrmypdf
... in ubuntu, debian, and the like.
Or, import each into a page of a multi-page #Inkscape document and save it as a PDF.
@KlusikEckert Ich finde ja #OCRMyPDF genial, aber als Kommandozeilenprogramm vielleicht nicht ganz so einsteigerfreundlich.
I am a rulebook hoarder. Whenever I take a closer look at a game downloading the rulebook is the first thing I do. I have over 2500 boardgame related pdf files. I access them using pdf-tools in #Emacs, index them using #recoll and I use a small hack to make M-x pdfgrep search using the recoll index. I use the #OCRmyPDF tool to OCR the ones that didn't come with embedded text.
#boardgames