TSci Logo

A General-Purpose AI Agent for Time Series Analysis

Haokun Zhao1,2 ★     Xiang Zhang3 ★     Jiaqi Wei4 ★     Yiwei Xu5    
Yuting He6     Siqi Sun7     Chenyu You1

1Stony Brook University    2University of California, San Diego    3University of British Columbia
4Zhejiang University    5University of California, Los Angeles    
6Case Western Reserve University    7Fudan University    Equal Contribution

Input Paper
Output Poster

Performance comparison of TSci with five LLM-based baselines. TSci outperforms LLM-based baselines on eight benchmarks spanning five domains (Left). The comprehensive report generated by TSci outperforms LLM-based baselines across five rubrics (Right).

Abstract

Time series forecasting is central to decision-making in domains as diverse as energy, finance, climate, and public health. In practice, forecasters face thousands of short, noisy series that vary in frequency, quality, and horizon, where the dominant cost lies not in model fitting, but in the labor-intensive preprocessing, validation, and ensembling required to obtain reliable predictions. Prevailing statistical and deep learning models are tailored to specific datasets or domains and generalize poorly. A general, domain-agnostic framework that minimizes human intervention is urgently in demand. In this paper, we introduce TimeSeriesScientist (TSci), the first LLM-driven agentic framework for general time series forecasting. The framework comprises four specialized agents: Curator performs LLM-guided diagnostics augmented by external tools that reason over data statistics to choose targeted preprocessing; Planner narrows the hypothesis space of model choice by leveraging multi-modal diagnostics and self-planning over the input; Forecaster performs model fitting and validation and based on the results to adaptively select the best model configuration as well as ensemble strategy to make final predictions; and Reporter synthesizes the whole process into a comprehensive, transparent report. With transparent natural-language rationales and comprehensive reports, TSci transforms the forecasting workflow into a white-box system that is both interpretable and extensible across tasks. Empirical results on eight established benchmarks demonstrate that TSci consistently outperforms both statistical and LLM-based baselines, reducing forecast error by an average of 10.4% and 38.2%, respectively. Moreover, TSci produces a clear and rigorous report that makes the forecasting workflow more transparent and interpretable.

Method Overview

Curator performs LLM-guided diagnostics augmented by external tools that reason over data statistics to choose targeted preprocessing.

Planner narrows the hypothesis space of model choice by leveraging multi-modal diagnostics and self-planning over the input.

Forecaster performs model fitting and validation and based on the results to adaptively select the best model configuration as well as ensemble strategy to make final predictions.

Reporter synthesizes the whole process into a comprehensive, transparent report.

Features top

TimeSeriesScientist

Curator: Performs LLM-guided diagnostics augmented by external tools that reason over data statistics to choose targeted preprocessing.

Curator

Evaluation Metrics

Content Metrics Table

Summary of datasets across different domains.

Design Metrics Table

We put forward five rubrics that comprehensively evaluate the generated report. The five rubrics evaluate the generated report along two dimensions: AS and MJ assess the technical rigor of analysis and modeling choices, while IC, AQ, and SC assess the communication quality and practical usefulness of the report.

Main Results

Performance Comparison Table

Time Series forecasting results compared with five LLM-based baselines. A lower value indicates better performance. Red: the best, Blue: the second best.

Performance Comparison Line Chart

Performance comparison of TSci with five LLM-based baselines across eight datasets.

Performance Comparison Table

Win rate (%) of TSci against five LLM-based baselines across five rubrics.

Ablation Study

We conduct an ablation study to isolate the contributions of the four agents in our TSci framework: Curator, Planner, Forecaster, and Reporter.

Curator Agent Ablation
Curator + Layout Agent Ablation

Ablation study of TSci with three variants: w/o Data Pre-process, w/o Data Analysis, and w/o Parameter Optimization. TSci attains the lowest MAE on six out of eight settings.

BibTeX

@misc{zhao2025timeseriesscientistgeneralpurposeaiagent,
    title={TimeSeriesScientist: A General-Purpose AI Agent for Time Series Analysis},
    author={Haokun Zhao and Xiang Zhang and Jiaqi Wei and Yiwei Xu and Yuting He and Siqi Sun and Chenyu You},
    year={2025},
    eprint={2510.01538},
    archivePrefix={arXiv},
    primaryClass={cs.LG},
    url={https://arxiv.org/abs/2510.01538},
}