Nath's Blog

Life. Through the eyes of Nathan Coad.

Raspberry Pi Document Scanner

Some time back I bought a Fujitsu ScanSnap 1300i and connected it to a raspberry pi 3 to create a network attached document scanner. It worked ok but the pi 3 was quite slow at handling OCR duties, so when the pi4 was announced I was keen to upgrade.

Some time passed and I finally got around to giving it a go. Unfortunately I’d forgotten all the resources and details I’d used to get this thing going, so I muddled my way through getting it going again with what I could find this time around, and comparing it to my old pi3.

Below is the initial setup I had to do. You will need the driver – my second hand ScanSnap 1300i didn’t come with any CD but it’s pretty easy to find online. https://github.com/stevleibelt/scansnap-firmware/blob/master/1300i_0D12.nal looks like one example.

apt-get install tesseract-ocr tesseract-ocr-eng sane pdftk imagemagick scanbd bc img2pdf ocrmypdf

sudo groupadd scanner
sudo usermod -a -G scanner pi
sudo usermod -a -G saned pi

mkdir -p /usr/share/sane/epjitsu/
cp 1300i_0D12.nal /usr/share/sane/epjitsu/

edit /etc/scanbd/scanbd.conf and set:

  • debug-level = 7 (to see errors more easily while setting up, change this back to 4 or lower when you’re happy everything is working)
  • user = pi (to run script and the scanning process as user pi)
  • script line in the scan block with script = "/home/pi/scripts/scan.sh"so it looks like this:
    action scan {
            filter = "^scan.*"
            numerical-trigger {
                    from-value = 1
                    to-value   = 0
            }
            desc   = "Scan to file"
            # script must be an relative path starting from scriptdir (see above),
            # or an absolute pathname.
            # It must contain the path to the action script without arguments
            # Absolute path example: script = "/some/path/foo.script
            script = "/home/pi/scripts/scan.sh"
    }

Replace the path to the script as needed.

Check to make sure that the scanner is detected properly.

sudo sane-find-scanner -q

found USB scanner (vendor=0x04c5 [FUJITSU], product=0x128d [ScanSnap S1300i]) at libusb:001:004
found USB scanner (vendor=0x0424, product=0xec00) at libusb:001:003

sudo scanimage -L

device `epjitsu:libusb:001:004′ is a FUJITSU ScanSnap S1300i scanner

Take note of the vendor and product ID, you will need them to create a custom udev rule so that the pi user can access the scanner. Create a new file named /etc/udev/rules.d/scanner.rules using sudo, and add the following line. Change the vendor and product IDs to match the output of the sane-find-scanner command above.

SUBSYSTEM=="usb", ATTRS{idVendor}=="04c5", ATTRS{idProduct}=="128d", MODE="0664", GROUP="scanner", ENV{libsane_matched}="yes"

Reboot and ensure that everything starts correctly. On my scanner the scan button will change from a blinking blue light to a solid blue light when scanbd is monitoring the button successfully.

My scan scripts might not be the best but I put them together from many different sources long ago. Apologies for not attributing whoever originally wrote them! This is the contents of /home/pi/scripts/scan.sh – as you can see it mostly calls other scripts to do the actual work.

#!/bin/bash

DIR=$( cd $( dirname "${BASH_SOURCE[0]}" ) && pwd )
JOBID=`date '+%Y-%m-%d_%H%M%S'`

# run the scanning in foreground
$DIR/01-scan.sh "$JOBID"

# execute processing in background
(
    # lock processing to make sure only one is running at a time
    (
        flock -x 200 # wait for lock
        $DIR/02-ocrmypdf.sh "$JOBID"
        $DIR/03-nccopy.sh "$JOBID"
    ) 200>/tmp/scan.lock
) &

Here is 01-scan.sh

!/bin/bash

BASE="/tmp"

if [ -z "$1" ]; then
    echo "Usage: $0 <jobid>"
    echo
    echo "Please provide unique jobid name as first parameter"
    exit 1
fi

OUTPUT="$BASE/$1"
mkdir -p "$OUTPUT"

echo 'scanning...'
scanimage --resolution 300 \
          --batch="$OUTPUT/scan_%03d.pnm" \
          --format=pnm \
          --mode Gray \
          --source 'ADF Duplex'
echo "Output in $OUTPUT/scan*.pnm"

02-ocrmypdf.sh

#!/bin/bash

LANGUAGE="eng" # the tesseract language
BASE="/tmp"
export LC_ALL=C.UTF-8
export LANG=C.UTF-8

if [ -z "$1" ]; then
    echo "Usage: $0 <jobid>"
    echo
    echo "Please provide existing jobid as first parameter"
    exit 1
fi

OUTPUT="$BASE/$1"

if [ ! -d "$OUTPUT" ]; then
    echo "jobid does not exist"
    exit 1
fi

cd "$OUTPUT"

# check if the page is blank
# http://philipp.knechtges.com/?p=190
echo 'checking for blank pages...'
for i in scan_*.pnm; do
    echo "${i}"
    histogram=`convert "${i}" -threshold 50% -format %c histogram:info:-`
    white=`echo "${histogram}" | grep "#FFFFFF" | sed -n 's/^ *\(.*\):.*$/\1/p'`
    black=`echo "${histogram}" | grep "#000000" | sed -n 's/^ *\(.*\):.*$/\1/p'`
    blank=`echo "scale=4; ${black}/${white} < 0.005" | bc`

    if [ ${blank} -eq "1" ]; then
        echo "${i} seems to be blank - removing it..."
        rm "${i}"
    fi
done

echo 'Combining scans into PDF ... '
img2pdf scan_*.pnm -o intermediate.pdf
echo 'Performing OCR ... '
ocrmypdf --tesseract-timeout 300 --tesseract-oem 1 --deskew --clean --verbose -l $LANGUAGE --sidecar "${1}.txt" intermediate.pdf "${1}.pdf"
echo "created $OUTPUT/${1}.pdf"

Lastly, 03-nccopy.sh just does a webdav upload to my nextcloud instance via a curl command, curl -X PUT -u $USERNAME:$PASS --data-binary @"$LOCAL" $URI

1 Comment

  1. Hey
    Thank you so much for sharing this

Leave a Reply

Your email address will not be published.

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.

© 2024 Nath's Blog

Theme by Anders NorenUp ↑