|  | --- | 
					
						
						|  | datasets: | 
					
						
						|  | - RUC-DataLab/DataScience-Instruct-500K | 
					
						
						|  | license: mit | 
					
						
						|  | pipeline_tag: text-generation | 
					
						
						|  | library_name: transformers | 
					
						
						|  | --- | 
					
						
						|  |  | 
					
						
						|  | <p align="center" width="100%"> | 
					
						
						|  | <img src="assets/logo.png" alt="DeepAnalyze" style="width: 60%; min-width: 300px; display: block; margin: auto;"> | 
					
						
						|  | </p> | 
					
						
						|  |  | 
					
						
						|  | # DeepAnalyze: Agentic Large Language Models for Autonomous Data Science | 
					
						
						|  | [](https://arxiv.org/abs/2510.16872) | 
					
						
						|  | [](https://huggingface.co/papers/2510.16872) | 
					
						
						|  | [](https://github.com/ruc-datalab/DeepAnalyze) | 
					
						
						|  | [](https://ruc-deepanalyze.github.io/) | 
					
						
						|  | [](https://huggingface.co/RUC-DataLab/DeepAnalyze-8B) | 
					
						
						|  | [](https://huggingface.co/datasets/RUC-DataLab/DataScience-Instruct-500K) | 
					
						
						|  |  | 
					
						
						|  |  | 
					
						
						|  | > **Authors**: **[Shaolei Zhang](https://zhangshaolei1998.github.io/), [Ju Fan*](http://iir.ruc.edu.cn/~fanj/), [Meihao Fan](https://scholar.google.com/citations?user=9RTm2qoAAAAJ), [Guoliang Li](https://dbgroup.cs.tsinghua.edu.cn/ligl/), [Xiaoyong Du](http://info.ruc.edu.cn/jsky/szdw/ajxjgcx/jsjkxyjsx1/js2/7374b0a3f58045fc9543703ccea2eb9c.htm)** | 
					
						
						|  |  | 
					
						
						|  |  | 
					
						
						|  | **DeepAnalyze** is the first agentic LLM for autonomous data science. It can autonomously complete a wide range of data-centric tasks without human intervention, supporting: | 
					
						
						|  | - 🛠 **Entire data science pipeline**: Automatically perform any data science tasks such as data preparation, analysis, modeling, visualization, and report generation. | 
					
						
						|  | - 🔍 **Open-ended data research**: Conduct deep research on diverse data sources, including structured data (Databases, CSV, Excel), semi-structured data (JSON, XML, YAML), and unstructured data (TXT, Markdown), and finally produce analyst-grade research reports. | 
					
						
						|  | - 📊 **Fully open-source**: The [model](https://huggingface.co/RUC-DataLab/DeepAnalyze-8B), [code](https://github.com/ruc-datalab/DeepAnalyze), [training data](https://huggingface.co/datasets/RUC-DataLab/DataScience-Instruct-500K), and [demo](https://huggingface.co/RUC-DataLab/DeepAnalyze-8B) of DeepAnalyze are all open-sourced, allowing you to deploy or extend your own data analysis assistant. | 
					
						
						|  |  | 
					
						
						|  | <p align="center" width="100%"> | 
					
						
						|  | <img src="./assets/deepanalyze.jpg" alt="deepanalyze" style="width: 70%; min-width: 300px; display: block; margin: auto;"> | 
					
						
						|  | </p> | 
					
						
						|  |  | 
					
						
						|  | More information refer to [DeepAnalyze's Repo](https://github.com/ruc-datalab/DeepAnalyze) |