Some time back I bought a Fujitsu ScanSnap 1300i and connected it to a raspberry pi 3 to create a network attached document scanner. It worked ok but the pi 3 was quite slow at handling OCR duties, so when the pi4 was announced I was keen to upgrade.
Some time passed and I finally got around to giving it a go. Unfortunately I’d forgotten all the resources and details I’d used to get this thing going, so I muddled my way through getting it going again with what I could find this time around, and comparing it to my old pi3.
Below is the initial setup I had to do. You will need the driver – my second hand ScanSnap 1300i didn’t come with any CD but it’s pretty easy to find online. https://github.com/stevleibelt/scansnap-firmware/blob/master/1300i_0D12.nal looks like one example.
apt-get install tesseract-ocr tesseract-ocr-eng sane pdftk imagemagick scanbd bc img2pdf ocrmypdf
sudo groupadd scanner
sudo usermod -a -G scanner pi
sudo usermod -a -G saned pi
mkdir -p /usr/share/sane/epjitsu/
cp 1300i_0D12.nal /usr/share/sane/epjitsu/
edit /etc/scanbd/scanbd.conf
and set:
debug-level = 7
(to see errors more easily while setting up, change this back to 4 or lower when you’re happy everything is working)user = pi
(to run script and the scanning process as user pi)- script line in the scan block with
script = "/home/pi/scripts/scan.sh"
so it looks like this:
action scan {
filter = "^scan.*"
numerical-trigger {
from-value = 1
to-value = 0
}
desc = "Scan to file"
# script must be an relative path starting from scriptdir (see above),
# or an absolute pathname.
# It must contain the path to the action script without arguments
# Absolute path example: script = "/some/path/foo.script
script = "/home/pi/scripts/scan.sh"
}
Replace the path to the script as needed.
Check to make sure that the scanner is detected properly.
sudo sane-find-scanner -q
found USB scanner (vendor=0x04c5 [FUJITSU], product=0x128d [ScanSnap S1300i]) at libusb:001:004
found USB scanner (vendor=0x0424, product=0xec00) at libusb:001:003
sudo scanimage -L
device `epjitsu:libusb:001:004′ is a FUJITSU ScanSnap S1300i scanner
Take note of the vendor and product ID, you will need them to create a custom udev rule so that the pi user can access the scanner. Create a new file named /etc/udev/rules.d/scanner.rules
using sudo, and add the following line. Change the vendor and product IDs to match the output of the sane-find-scanner command above.
SUBSYSTEM=="usb", ATTRS{idVendor}=="04c5", ATTRS{idProduct}=="128d", MODE="0664", GROUP="scanner", ENV{libsane_matched}="yes"
Reboot and ensure that everything starts correctly. On my scanner the scan button will change from a blinking blue light to a solid blue light when scanbd is monitoring the button successfully.
My scan scripts might not be the best but I put them together from many different sources long ago. Apologies for not attributing whoever originally wrote them! This is the contents of /home/pi/scripts/scan.sh
– as you can see it mostly calls other scripts to do the actual work.
#!/bin/bash
DIR=$( cd $( dirname "${BASH_SOURCE[0]}" ) && pwd )
JOBID=`date '+%Y-%m-%d_%H%M%S'`
# run the scanning in foreground
$DIR/01-scan.sh "$JOBID"
# execute processing in background
(
# lock processing to make sure only one is running at a time
(
flock -x 200 # wait for lock
$DIR/02-ocrmypdf.sh "$JOBID"
$DIR/03-nccopy.sh "$JOBID"
) 200>/tmp/scan.lock
) &
Here is 01-scan.sh
!/bin/bash
BASE="/tmp"
if [ -z "$1" ]; then
echo "Usage: $0 <jobid>"
echo
echo "Please provide unique jobid name as first parameter"
exit 1
fi
OUTPUT="$BASE/$1"
mkdir -p "$OUTPUT"
echo 'scanning...'
scanimage --resolution 300 \
--batch="$OUTPUT/scan_%03d.pnm" \
--format=pnm \
--mode Gray \
--source 'ADF Duplex'
echo "Output in $OUTPUT/scan*.pnm"
02-ocrmypdf.sh
#!/bin/bash
LANGUAGE="eng" # the tesseract language
BASE="/tmp"
export LC_ALL=C.UTF-8
export LANG=C.UTF-8
if [ -z "$1" ]; then
echo "Usage: $0 <jobid>"
echo
echo "Please provide existing jobid as first parameter"
exit 1
fi
OUTPUT="$BASE/$1"
if [ ! -d "$OUTPUT" ]; then
echo "jobid does not exist"
exit 1
fi
cd "$OUTPUT"
# check if the page is blank
# http://philipp.knechtges.com/?p=190
echo 'checking for blank pages...'
for i in scan_*.pnm; do
echo "${i}"
histogram=`convert "${i}" -threshold 50% -format %c histogram:info:-`
white=`echo "${histogram}" | grep "#FFFFFF" | sed -n 's/^ *\(.*\):.*$/\1/p'`
black=`echo "${histogram}" | grep "#000000" | sed -n 's/^ *\(.*\):.*$/\1/p'`
blank=`echo "scale=4; ${black}/${white} < 0.005" | bc`
if [ ${blank} -eq "1" ]; then
echo "${i} seems to be blank - removing it..."
rm "${i}"
fi
done
echo 'Combining scans into PDF ... '
img2pdf scan_*.pnm -o intermediate.pdf
echo 'Performing OCR ... '
ocrmypdf --tesseract-timeout 300 --tesseract-oem 1 --deskew --clean --verbose -l $LANGUAGE --sidecar "${1}.txt" intermediate.pdf "${1}.pdf"
echo "created $OUTPUT/${1}.pdf"
Lastly, 03-nccopy.sh just does a webdav upload to my nextcloud instance via a curl command, curl -X PUT -u $USERNAME:$PASS --data-binary @"$LOCAL" $URI