Building Halcyon-1B: A Creative Writing Model
Halcyon-1B represents my venture into developing a specialized language model for creative writing. Built on the foundation of Gemma-3B, this model has been fine-tuned to excel in storytelling, literary exploration, and nuanced narrative construction.
The Vision
The inspiration for Halcyon-1B came from recognizing a gap in the current landscape of language models. While many models excel at general tasks, there was an opportunity to create a specialized model that could better understand and generate creative writing with enhanced literary sensibility.
Technical Implementation
Base Model Selection
I chose unsloth/gemma-3-1b-it-unsloth-bnb-4bit as the foundation for several reasons:
- Efficiency: The 4-bit quantization allows for faster training and inference while maintaining quality
- Base Performance: Gemma-3B demonstrates strong language understanding capabilities
- Resource Optimization: Unsloth's optimizations enable better performance on consumer hardware
Training Methodology
The training process leveraged two key technologies:
- Unsloth: Used for 2x faster training through optimized kernels and memory management
- Hugging Face's TRL Library: Employed for efficient fine-tuning and training management
Dataset Curation
The model was trained on the Creative Writing ShareGPT dataset, which provides:
- High-quality creative writing examples
- Diverse writing styles and genres
- Natural language interactions focused on storytelling
Model Capabilities
Halcyon-1B excels in several creative writing tasks:
- Narrative Generation: Creating coherent and engaging stories
- Style Adaptation: Matching various literary styles and tones
- Character Development: Crafting consistent and compelling characters
- Plot Construction: Developing structured and meaningful narratives
Usage Example
Here's how to use Halcyon-1B in your projects:
from unsloth import FastModel
from transformers import TextStreamer
# Load model and tokenizer
model, tokenizer = FastModel.from_pretrained(
model_name = "colesmcintosh/Halcyon-1B",
max_seq_length = 2048,
load_in_4bit = True,
)
# Format prompt using Gemma-3 chat template
messages = [{
"role": "user",
"content": [{"type" : "text", "text" : "Write a mythological tale about how the oceans came to be."}]
}]
text_ids = tokenizer.apply_chat_template(messages, add_generation_prompt=True)
text_str = tokenizer.decode(text_ids)
# Generate response
outputs = model.generate(
**tokenizer([text_str], return_tensors="pt").to("cuda"),
max_new_tokens=64,
temperature=1.0,
top_p=0.95,
top_k=64,
streamer=TextStreamer(tokenizer, skip_prompt=True),
)
Future Development
Looking ahead, I plan to:
- Expand Training Data: Incorporate more diverse creative writing styles
- Optimize Performance: Further tune the model for specific creative tasks
- Community Integration: Develop tools and plugins for popular writing platforms
Conclusion
Halcyon-1B represents a step forward in specialized language models for creative writing. By focusing on this specific domain, we can push the boundaries of what's possible in AI-assisted creative writing while maintaining efficiency and accessibility.
The model is open source and available on Hugging Face, and I welcome contributions and feedback from the community.