Great news! We’ve released two pretrained PictSure models on Hugging Face 🤗 — making it easier than ever to try out vision-only in-context learning (ICL).
What’s included
- PictSure-ResNet18
A compact 53M parameter model using frozen ResNet18 embeddings. - PictSure-ViT-Triplet
A 128M parameter Vision Transformer variant pretrained with a triplet loss, yielding a more structured embedding space and stronger out-of-domain generalization.
Both models are plug-and-play: no fine-tuning, no backward passes. You can condition on a few labeled examples and directly classify new queries.
Why it matters
With these Hugging Face releases, anyone can:
- Explore few-shot classification in vision without writing custom training loops.
- Benchmark PictSure on in-domain datasets like miniImageNet and out-of-domain datasets like Brain Tumor or OrganCMNIST.
- Build on top of our models for research or applications in specialized domains.
Get started
Check them out here:
To get things started, here is a minimal example using the ResNet18 model:
from PictSure import PictSure
from PIL import Image
# Load pre-trained model
model = PictSure.from_pretrained("pictsure/pictsure-vit")
# Prepare context images and labels
context_images = [
Image.open("cat1.jpg"),
Image.open("cat2.jpg"),
Image.open("dog1.jpg"),
Image.open("dog2.jpg")
]
context_labels = [0, 0, 1, 1] # 0 for cat, 1 for dog
# Set context
model.set_context_images(context_images, context_labels)
# Make prediction on new image
test_image = Image.open("unknown_animal.jpg")
prediction = model.predict(test_image)
print(f"Predicted class: {prediction}")
Next steps
- Add example notebooks and tutorials for quick few-shot experiments
- Provide a Hugging Face demo space with interactive tasks
- Extend the library with more pretrained backbones