Building Cassandra: A Transparent Real-Time Prediction Platform with Blockchain Integration
(note: Cassandra is a side project I am working on to better understand how complex applications communicate through APIs)
Prediction markets—such as Kalshi or Polymarket—often keep their algorithms behind closed doors. Cassandra seeks to change that by offering an open-source, fully transparent pipeline. It empowers users to set a custom prompt, see how predictions are made in real time, and incorporate blockchain technology for donation-based funding and immutable tracking of prediction changes.
Although still a work in progress, Cassandra serves two main purposes:
- Personal Educational Exercise – Allowing me to learn best practices in AWS infrastructure and microservices.
- Transparent Prediction Market – Allowing anyone to witness and verify the end-to-end process behind generating forecasts.
Below, we’ll walk through Cassandra’s high-level design—covering AWS services, microservices, the Vue.js frontend, and the planned blockchain integrations.
What is Cassandra?
Cassandra is more than just a prediction engine. It’s designed with a core philosophy:
- Transparency – Every step in the pipeline is visible, from data ingestion to model inference.
- Open Source – The entire codebase will be publicly available for inspection and modification.
- currently it is in private repositories to allow me to control the server costs during testing.
- Blockchain-Enabled – Users will be able to donate Bitcoin to help fund frequent model runs, and the evolving prediction data is published to a blockchain for immutable, tamper-proof record-keeping.
By opening the “black box” behind typical prediction markets, Cassandra empowers people to tweak parameters, upload new datasets, and observe exactly how each forecast is generated. Blockchain further ensures that predictions—and their historical changes—are securely documented over time.
High-Level Architecture
Cassandra’s architecture (diagram above) consists of three main sections:
- AWS Infrastructure
- S3 Buckets & Model Training
- Frontend & Backend
AWS Infrastructure
- Amazon S3: Raw Data
- AWS Glue: Clean Data Job
- AWS Glue: Prediction Data Job
The starting point for Cassandra’s data. Users or automated processes can upload raw datasets, whether they’re sports odds, financial market data, or user-generated inputs.
This Glue job cleans and transforms the raw data—removing duplicates, handling missing values, and normalizing formats—then writes the output to S3: Cleaned Data.
After training, this second Glue job transforms new incoming data to generate predictions, saving results to S3: Predictions.
Model Training with SageMaker
- SageMaker Model Training
Using the cleaned data, Amazon SageMaker trains the machine learning model. Once complete, model artifacts are saved to S3: Model Artifacts. This modular design lets you quickly retrain or swap out models as data or methodologies evolve.
Backend Services
Cassandra relies on microservices built with Flask, ensuring a clear separation of concerns:
- Flask APIs (cassandra_backend & cassandra_ml)
- cassandra_backend: Manages core application logic, currently focusing on serving the front end application with updated predictions.
- cassandra_ml: Loads the trained ML model (or references predictions from S3) to generate real-time forecasts.
- Cassandra Database
Currently utilizing s3, all data is held in buckets to assist in data collection, cleaning and serving predictions.
By separating the logic (data processing, machine learning, and database operations) into distinct services, Cassandra can scale or evolve each part independently.
Frontend (Vue.js)
- Vue.js Frontend
The user interface, built in Vue.js, allows users to:
- Input custom parameters or prompts for prediction.
- Request real-time forecasts from the Flask APIs.
- View the breakdown of each step (data cleaning, model inference, etc.).
This transparency extends to the frontend, giving end users an in-depth look at how their inputs transform into outputs.
Introducing Blockchain
Bitcoin Donations
Running a prediction workflow—especially at scale—incurs compute and storage costs. To keep Cassandra free and open for anyone to explore, we will add a Bitcoin donation feature. By sending BTC to a publicly listed wallet, users can:
- Fund More Frequent Predictions
- Participate in Project Development
Each donation will offset the cost of additional AWS Glue jobs, SageMaker training sessions, and data storage. The more donations we receive, the more often Cassandra can retrain or update its models.
Contributors will effectively become patrons of transparent prediction technology, supporting an open-source alternative to closed prediction markets.
Blockchain Ledger of Prediction Changes
Beyond funding, we also plan to publish Cassandra’s prediction changes to a blockchain (chain TBD). Here’s why:
- Immutable Record
- Enhanced Trust
- Open Source Transparency
By writing the evolving prediction data—or a hashed summary of it—to a decentralized ledger, we ensure that no one can retroactively alter or manipulate historical predictions.
Users can verify that Cassandra’s forecasts have not been tampered with. Each prediction update will be time-stamped and recorded, making the entire process more auditable.
Just as the model code is transparent, so too is the record of how predictions shift over time, creating a fully traceable history of Cassandra’s outputs.
Data Flow Breakdown
- Raw Data Ingestion
- Users or scripts upload data to S3: Raw Data.
- Data Cleaning
- AWS Glue: Clean Data Job processes the raw data, storing cleaned outputs in S3: Cleaned Data.
- Model Training (still planning this stage)
- SageMaker trains the model using the cleaned dataset, saving the final artifacts to S3: Model Artifacts.
- Prediction Data Preparation
- AWS Glue: Prediction Data Job transforms new data for inference, storing results in S3: Predictions.
- Real-Time Inference
- cassandra_ml either references S3 or loads the model directly for on-demand predictions.
- Predictions are sent to the Vue.js frontend (and will be simultaneously hashed and published to the blockchain).
- User Interaction & Funding
- The Vue.js interface allows users to donate Bitcoin to support frequent re-training.
- Donors can see how their contributions directly impact the platform’s ability to produce timely forecasts.
Why Transparency & Blockchain Matter
Auditability
Users can verify each step in the prediction pipeline and confirm that the historical record of forecasts hasn’t been altered.
Open Collaboration
Since Cassandra (will be) open source, developers worldwide can inspect, contribute to, or fork the project. Blockchain entries add another layer of verifiability, making it easier to trust collaborative efforts.
Sustainable Funding
Traditional research and development can be expensive. Bitcoin donations create a decentralized funding model, allowing Cassandra to scale its AWS resources without locking out users behind paywalls.
Lessons Learned & Best Practices
- Modular Microservices
- Cloud-Native Services
- Community-Driven Development
Breaking down tasks into separate Flask APIs ensures each can be scaled or replaced independently.
Leveraging AWS Glue, S3, and SageMaker saves time on infrastructure management, letting us focus on transparency and blockchain integrations.
Opening your codebase to the public invites feedback, bug fixes, and innovative ideas. Blockchain’s immutable record also serves as a public trust layer for any modifications.
Next Steps & Future Plans
Choosing a Blockchain
We’re evaluating different blockchains for storing Cassandra’s prediction changes—factors include transaction costs, scalability, and developer ecosystem.
Enhanced Real-Time Streaming
Integrating Amazon Kinesis or a similar service could enable continuous model updates and near-instantaneous blockchain logging.
Expanded Crypto Payment Options
While Bitcoin is our first supported cryptocurrency, future updates may include Ethereum or other blockchain assets for donations.
Live Demo & Documentation
A publicly accessible site will allow users to test predictions, track the model’s performance, and explore how funds are utilized.
Advanced Data Visualizations
Charts, graphs, and interactive dashboards in the Vue.js frontend could make it even easier to interpret predictions and on-chain records.
Conclusion
Cassandra stands at the intersection of open-source machine learning, transparent prediction markets, and decentralized finance. By combining AWS infrastructure, microservices, a user-friendly Vue.js interface, and blockchain technology, Cassandra offers a compelling alternative to traditional prediction platforms.
- For Myself: It’s a hands-on tutorial in building scalable, cloud-native applications with real-time inference.
- For Future Users: It’s a transparent window into how predictions are formed, with the added benefit of verifying historical data on the blockchain.
- For Donors: It’s a way to directly contribute to the growth and sustainability of an open-source prediction platform.
As the project matures, we look forward to integrating new blockchains, expanding crypto payment options, and delivering more frequent, reliable predictions—all while keeping every step transparent and verifiable.