Algorithm Challenge

Why this challenge?

The recommendation algorithms of major platforms remain black boxes whose effects, addiction, polarisation, invisibilisation of small creators, are now well documented. Bulle made the opposite choice: a fully public algorithm, computed from a simple and auditable formula.

This challenge extends that logic. Rather than improving our algorithm internally, we open the data and invite the community to propose better. The selected proposal will be the subject of a deeper collaboration with Bulle to actually implement the algorithm into the platform.

Nine principles to follow

These principles define what a good algorithm looks like for Bulle. Your evaluation metric and your algorithm must objectively meet these criteria.

🌍

Generalist

Outside subscriptions and interests, the algorithm recommends the same content to everyone. No filter bubble, no behavioural personalisation.

📈

Engagement-correlated

Shares, reposts, quizzes, watchtime are the best quality signals. The algorithm uses them, but as proportions of views, never as raw numbers.

🤝

Non-divisive

No bonus for shocking or provoking. Content generating many rejects or dislikes is penalised, even if it generates views.

⚖️

Fair

A new or small creator must be able to surface. The algorithm rotates creators (round-robin) instead of saturating the feed with the most visible ones.

🔥

Hot information

Today's publications get priority. The algorithm surfaces fresh content while it is still relevant, without freezing the feed on past hits.

🚫

Anti-spam

Mass posting or clickbait should not pay off. Signals are normalised by views, so quantity never compensates for poor quality.

⏳

Temporal decay

A publication loses relevance over time, without disappearing abruptly. No implicit shadow ban.

🎲

Second chance

The algorithm must be able to re-evaluate a publication based on its recent views, not just on cumulative history. A publication with a bad start can climb back.

🪟

Transparent

The chosen algorithm must be expressible in a few formulas, and auditable by anyone. No opaque model trained on user data.

The dataset

A single CSV file, aggregated by day, containing all recommended publications (including those with zero views, to allow round-robin strategy evaluation).

Column	Type	Description
publication_id	uuid	Anonymised (stable uuid) publication identifier.
creator_id	uuid	Anonymised (stable uuid) creator identifier. Useful for round-robin and fairness.
creator_status	text	Creator status: `semipro` or `pro`.
published_at	datetime	Publication date and time, rounded to the hour (UTC).
date	date	Evaluation day (recommendation day).
category_name	text	Top-level category (e.g. `Nature and environment`).
subcategory_name	text	Sub-category (e.g. `Ecological awareness`).
type	text	Format: `video`, `carrousel` (image or mixed image/video), `message` (text).
total_views	integer	Cumulative views at the evaluated date (may be 0).
prop_bookmarks	real ∈ [0,1]	Bookmarks ÷ views.
prop_shares	real ∈ [0,1]	Shares ÷ views.
prop_reposts	real ∈ [0,1]	Reposts ÷ views.
prop_quiz	real ∈ [0,1]	Quizzes completed ÷ views.
prop_love	real ∈ [0,1]	"Love" reactions ÷ views.
prop_like	real ∈ [0,1]	"Like" reactions ÷ views.
prop_dislike	real ∈ [0,1]	"Dislike" reactions ÷ views.
prop_reject	real ∈ [0,1]	Explicit rejects ÷ views.

⚠️

Chicken-and-egg effect. The proportions reflect views obtained under the current algorithm. A publication that was barely surfaced may have received fewer views. Always work in proportions, never in absolute values, and keep in mind that publications with very few views may be noisy.

Access the dataset

Downloading requires a Bulle account. This restriction allows us to keep traceability of dataset accesses and to limit abusive usage. Only people who legitimately obtained the dataset through this page will be able to submit their work for evaluation.

Submission format

A complete submission consists of two files: a description document and the ranking CSV produced by your algorithm.

📝

1. Description document

A document of one A4 page maximum, in font size 12, explaining: the evaluation metric you propose, the corresponding algorithm, and the associated formulas. Formulas must be simple, explicit, and rely directly on the variables of the table (proportions, dates, status, etc.).

📊

2. Ranking CSV

The output of your algorithm applied to the dataset, as a submission.csv file with three columns: date, publication_id, rank. One row per pair present in the dataset, unique ranks without gaps, starting at 1.

# submission.csv, three columns, header required
date,publication_id,rank
2026-04-24,3f4b2490-8aad-5949-9fdc-e40e4566e5d3,1
2026-04-24,...,2
2026-04-25,...,1
# One row per (date, publication_id) pair in the dataset.

📥

How to submit. Compress both files (PDF + CSV) into a .zip file and email it to contact@shabon.fr with subject "Bulle Algorithm Challenge", before June 1st, 2026 at 11:59 pm (Paris time). One submission per participant or team. In case of multiple submissions, only the latest one is retained.

Display of submissions

A dedicated page will be published to showcase the rankings actually submitted, with display of the proposed publications for each given day. Participants will be able to visualise and compare their respective algorithms.

This page will be announced publicly after the submission deadline.

Recognition

Bulle is a mission-driven project, run by SHABON. The reward of this challenge is first and foremost a public recognition of the work done.

📰

Dedicated article

The selected proposal will be featured in a public article on Bulle's blog, presenting the method, the author and the approach.

🎓

Open to all

Individuals, students, teams, labs, non-profits, companies. One restriction only: the code and method must be publicly explainable.

🤝

Collaboration

The selected proposal will be the subject of a deeper collaboration with the Bulle team to actually implement the algorithm into the platform.

🔗

Public credit

If the solution is integrated into Bulle, its author is publicly credited on the algorithm page.

📜 Read the full rules

Evaluation metric leads (tied to the algorithm)

For reference only, here are a few leads we have considered. They are neither imposed nor definitive: your work consists precisely in proposing better, or combining them differently.

Lead #1: Weighted intrinsic quality

We combine the eight engagement signals with weights reflecting what Bulle values. Bulle's current weighting favours reflection (quizzes) and voluntary sharing above plain "like", and penalises negative signals.

q= prop_quiz (×4)

r= prop_reposts (×3)

s= prop_shares (×2)

ℓ₂= prop_love (×2)

ℓ₁= prop_like (×1)

ν₁= prop_dislike (×−1)

ν₂= prop_reject (×−2)

(·)₊= max(0, ·)

The exact weights remain at your discretion. The goal is for them to objectively reflect the principles stated above.

Lead #2: Soft temporal decay

Rather than a short half-life that quickly extinguishes content, a simple exponential D(t) = exp(−t / τ) with a time constant τ to calibrate allows a more gradual decay. As an indicative range, τ ≈ 14 to 30 days gives quality content time to be discovered without freezing the feed.

The constant τ should be calibrated to the target editorial pace. Too small and old information is extinguished, too large and the feed is saturated with old content.

Lead #3: Creator fairness (round-robin)

To prevent a single creator from saturating the top of the ranking, a decreasing factor can be applied at each repetition: a publication's value is divided by 1 plus the number of publications by the same creator already ranked above.

Lead #4: Aggregation over the ranking

Once each publication's value is estimated, the day's overall ranking can be evaluated through an nDCG (normalised discounted cumulative gain). The idea: heavily reward good decisions at the top of the ranking, much less at the bottom. The final score is averaged across all evaluated days.

Other aggregations are conceivable (weighted sum, Spearman, MAP…). It is up to you to justify your choice.

💡

Important. These formulas are only a starting point. The challenge is precisely to propose a public evaluation metric that objectively meets the principles stated above, along with a concrete algorithm that complies with it. You may reuse these leads, adapt them, or propose entirely new ones.

Known limitations

1 No personalisation. The challenge covers only the generalist algorithm, outside subscriptions and interests. Per-user signals are not provided.
2 Current-algorithm bias. Proportions reflect what was already surfaced by the production algorithm. A weak signal may therefore reflect lack of initial exposure rather than lack of quality.
3 Limited volume. Bulle is young. The dataset spans only a few days and a small number of creators. Conclusions will be indicative, not definitive.
4 No integration guarantee. The selected proposal may not be integrated if it does not meet Bulle's operational constraints (latency, frugality, auditability). In any case, public recognition is granted.

Design Bulle's algorithm