SFKs commited on
Commit
b51ea12
·
verified ·
1 Parent(s): b263efc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -2
README.md CHANGED
@@ -1,3 +1,8 @@
 
 
 
 
 
1
  # Dripper(MinerU-HTML)
2
 
3
  **Dripper(MinerU-HTML)** is an advanced HTML main content extraction tool based on Large Language Models (LLMs). It provides a complete pipeline for extracting primary content from HTML pages using LLM-based classification and state machine-guided generation.
@@ -328,5 +333,4 @@ Contributions are welcome! Please feel free to submit a Pull Request.
328
  - Built on top of [vLLM](https://github.com/vllm-project/vllm) for efficient LLM inference
329
  - Uses [Trafilatura](https://github.com/adbar/trafilatura) for fallback extraction
330
  - Finetuned on [Qwen3](https://github.com/QwenLM/Qwen3)
331
- - Inspired by various HTML content extraction research
332
-
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - opendatalab/AICC
5
+ ---
6
  # Dripper(MinerU-HTML)
7
 
8
  **Dripper(MinerU-HTML)** is an advanced HTML main content extraction tool based on Large Language Models (LLMs). It provides a complete pipeline for extracting primary content from HTML pages using LLM-based classification and state machine-guided generation.
 
333
  - Built on top of [vLLM](https://github.com/vllm-project/vllm) for efficient LLM inference
334
  - Uses [Trafilatura](https://github.com/adbar/trafilatura) for fallback extraction
335
  - Finetuned on [Qwen3](https://github.com/QwenLM/Qwen3)
336
+ - Inspired by various HTML content extraction research