Skip to content

THETA Topic Model

Advanced Topic Modeling with Qwen Embeddings


THETA is a state-of-the-art topic modeling framework that leverages Qwen3-Embedding models to achieve superior performance in topic discovery and analysis. Designed as an improvement over traditional topic models like LDA and ETM, THETA combines the power of large language model embeddings with advanced neural topic modeling architectures.

  • Getting Started


    Install THETA and train your first topic model in minutes

    Quick Start

  • User Guide


    Complete workflow from data preparation to result analysis

    User Guide

  • Models


    Architecture details of THETA and baseline models

    Models

  • API Reference


    Complete parameter documentation for all CLI tools

    API Reference


Key Features

Feature Description
Powerful Embeddings Built on Qwen3-Embedding (0.6B / 4B / 8B) for superior semantic understanding
Flexible Training Zero-shot, supervised, and unsupervised modes
Rich Visualizations Topic distributions, heatmaps, UMAP projections, pyLDAvis
Multilingual Full support for English and Chinese data
Extensible Easy customization with new datasets and configurations
Comprehensive Evaluation TD, TC, NPMI, and more metrics

Model Comparison

Model Embedding Type Characteristics
THETA Qwen3-Embedding Neural Our method — best performance
LDA Probabilistic Classic generative model
ETM Word2Vec Neural Embedded topic model
CTM SBERT Neural Contextualized model
DTM SBERT Neural Dynamic temporal model

Quick Example

# 1. Preprocess data
python prepare_data.py \
    --dataset 20ng \
    --model theta \
    --model_size 0.6B \
    --mode zero_shot \
    --vocab_size 5000 \
    --gpu 0

# 2. Train model
python run_pipeline.py \
    --dataset 20ng \
    --models theta \
    --model_size 0.6B \
    --mode zero_shot \
    --num_topics 20 \
    --epochs 100 \
    --gpu 0

Citation

If you use THETA in your research, please cite:

@article{theta2025,
  title={THETA: Advanced Topic Modeling with Qwen Embeddings},
  author={CodeSoul},
  year={2025}
}