Cole McIntosh

AI & Full Stack Engineer

Building Halcyon-1B: A Creative Writing Model

Halcyon-1B represents my venture into developing a specialized language model for creative writing. Built on the foundation of Gemma-3B, this model has been fine-tuned to excel in storytelling, literary exploration, and nuanced narrative construction.

The Vision

The inspiration for Halcyon-1B came from recognizing a gap in the current landscape of language models. While many models excel at general tasks, there was an opportunity to create a specialized model that could better understand and generate creative writing with enhanced literary sensibility.

Technical Implementation

Base Model Selection

I chose unsloth/gemma-3-1b-it-unsloth-bnb-4bit as the foundation for several reasons:

  • Efficiency: The 4-bit quantization allows for faster training and inference while maintaining quality
  • Base Performance: Gemma-3B demonstrates strong language understanding capabilities
  • Resource Optimization: Unsloth's optimizations enable better performance on consumer hardware

Training Methodology

The training process leveraged two key technologies:

  1. Unsloth: Used for 2x faster training through optimized kernels and memory management
  2. Hugging Face's TRL Library: Employed for efficient fine-tuning and training management

Dataset Curation

The model was trained on the Creative Writing ShareGPT dataset, which provides:

  • High-quality creative writing examples
  • Diverse writing styles and genres
  • Natural language interactions focused on storytelling

Model Capabilities

Halcyon-1B excels in several creative writing tasks:

  • Narrative Generation: Creating coherent and engaging stories
  • Style Adaptation: Matching various literary styles and tones
  • Character Development: Crafting consistent and compelling characters
  • Plot Construction: Developing structured and meaningful narratives

Usage Example

Here's how to use Halcyon-1B in your projects:

from unsloth import FastModel
from transformers import TextStreamer

# Load model and tokenizer
model, tokenizer = FastModel.from_pretrained(
    model_name = "colesmcintosh/Halcyon-1B",
    max_seq_length = 2048,
    load_in_4bit = True,
)

# Format prompt using Gemma-3 chat template
messages = [{
    "role": "user",
    "content": [{"type" : "text", "text" : "Write a mythological tale about how the oceans came to be."}]
}]

text_ids = tokenizer.apply_chat_template(messages, add_generation_prompt=True)
text_str = tokenizer.decode(text_ids)

# Generate response
outputs = model.generate(
    **tokenizer([text_str], return_tensors="pt").to("cuda"),
    max_new_tokens=64,
    temperature=1.0,
    top_p=0.95,
    top_k=64,
    streamer=TextStreamer(tokenizer, skip_prompt=True),
)

Future Development

Looking ahead, I plan to:

  1. Expand Training Data: Incorporate more diverse creative writing styles
  2. Optimize Performance: Further tune the model for specific creative tasks
  3. Community Integration: Develop tools and plugins for popular writing platforms

Conclusion

Halcyon-1B represents a step forward in specialized language models for creative writing. By focusing on this specific domain, we can push the boundaries of what's possible in AI-assisted creative writing while maintaining efficiency and accessibility.

The model is open source and available on Hugging Face, and I welcome contributions and feedback from the community.