markdown-it demo

clear permalink

### Install
```bash
sudo apt-get update
sudo apt-get install tesseract-ocr
sudo apt-get install tesseract-ocr-eng # 英文訓練數據
sudo apt-get install tesseract-ocr-chi-tra #中文訓練數據
```

### execute
```bash
tesseract image.tiff output --psm 6 -l eng
tesseract image_file output_file -l chi_tra+eng
```
在這個例子中，image.tiff 是待識別的圖片文件名，output 是識別結果的輸出文件名，--psm 6 選項表示使用頁面分割模式 6 進行識別，-l eng 選項表示使用英語語言進行識別。
Tesseract 也支持識別其他語言，可以根據需要使用相應的語言選項。
完成以上步驟後，Tesseract 將會執行 OCR 文字識別，並將識別結果輸出到指定的輸出文件中。

### Download model
從 Tesseract OCR 官方 GitHub 存儲庫中下載需要的語言數據文件
```bash
git clone https://github.com/tesseract-ocr/tessdata.git
```

### Spring Config
```java
@Configuration
public class OcrConfig {
    @Bean
    public Tesseract tesseract() {
        Tesseract tesseract = new Tesseract();
        tesseract.setDatapath("/path/to/tessdata");
        // 設置語言列表
        tesseract.setLanguage("eng+chi_tra");
        // 返回 Tesseract 物件
        return tesseract;
    }
}
```

### Spring Service
```java
import java.awt.image.BufferedImage;
import java.io.File;
import java.io.InputStream;
import java.io.IOException;
import java.net.MalformedURLException;
import java.net.URL;
import javax.imageio.ImageIO;
import net.sourceforge.tess4j.ITesseract;
import net.sourceforge.tess4j.TesseractException;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Service;

@Service
public class OcrService {
	@Autowired private ITesseract tesseract;
       // 由 image url 轉成 BufferedImage 再進行辨識
	public String recognizeImage(String imageUrl) throws TesseractException, MalformedURLException, IOException {
		InputStream inputStream = new URL(imageUrl).openStream();
		BufferedImage image = ImageIO.read(inputStream);
		String result = tesseract.doOCR(image);
		System.out.println(result);
		return result;
	}
}

```

html source debug