Under the Hood
How NFTruth's AI Detection Engine Works
Deep dive into the machine learning architecture and data pipeline that powers NFT fraud detection
๐๏ธ System Architecture
A comprehensive AI-powered fraud detection system
NFTruth/ โโโ ๐ฏ app/ โ โโโ ๐ data/ โ โ โโโ opensea_collector.py # OpenSea API integration & data collection โ โ โโโ reddit_collector.py # Reddit OAuth + sentiment analysis pipeline โ โ โโโ etherscan_collector.py # Ethereum blockchain analysis โ โ โโโ ml_data_transformer.py # Feature engineering & ML data preparation โ โโโ ๐ค models/ โ โ โโโ model.py # Ensemble ML model implementation โ โ โโโ model_notebook.ipynb # Technical documentation & explanation โ โ โโโ opensea_known_legit.py # Curated legitimate collections database โ โโโ ๐ model_training.py # Synthetic data generation & training pipeline โ โโโ ๐ฎ predict.py # Prediction interface & risk assessment โ โโโ ๐ opensea_collections.py # Collection slug mappings โโโ ๐ model_outputs/ โ โโโ rule_based_model.json # Rule-based baseline model โโโ ๐ training_data/ # Generated training datasets โโโ ๐งช tests/ โ โโโ test_model_setup.py # ML functionality validation โ โโโ test_opensea.py # API connection testing โโโ ๐ requirements.txt # Python dependencies โโโ ๐ README.md # Documentation
๐ง How The System Works
Multi-stage AI analysis pipeline for comprehensive fraud detection
๐ Multi-Source Data Collection Pipeline
๐ช OpenSea API Integration
- โข Collection verification status (safelist_status)
- โข Trading statistics (total_volume, floor_price, market_cap)
- โข Social presence (Discord, Twitter links)
- โข Ownership metrics (total_supply, num_owners)
- โข Price dynamics (average_price, price_changes)
๐ฌ Reddit Social Intelligence
- โข OAuth 2.0 authentication with Reddit API
- โข Multi-subreddit targeted data collection
- โข VADER sentiment analysis integration
- โข Scam keyword detection: ['scam', 'rugpull', 'fake', 'fraud']
- โข Hype indicator tracking: ['moon', 'diamond hands', 'hodl']
โ๏ธ Blockchain Analysis
- โข Creator wallet and transaction history
- โข Suspicious pattern detection (wash trading)
- โข Mint distribution pattern recognition
๐ฌ Advanced Feature Engineering
The MLDataTransformer class transforms raw data into 20+ meaningful ML features:
๐ฐ Market Intelligence
๐ฃ๏ธ Social Sentiment
๐ค Ensemble Machine Learning Architecture
Four specialized algorithms working together:
Random Forest
Complex interactions
Gradient Boosting
Sequential learning
Logistic Regression
Interpretable patterns
Support Vector Machine
Optimal boundaries
๐ฏ Intelligent Risk Assessment
Our ensemble AI models provide comprehensive risk analysis with detailed confidence scores:
Analysis Results
Legitimateโ Safe to proceed - This collection shows strong legitimacy indicators across all analysis categories. Low fraud risk detected.
โ ๏ธ Risk Classification System
| Risk Level | Score Range | Characteristics | Action Recommended |
|---|---|---|---|
| ๐ข Low Risk | 0-30% | Verified, high volume, strong community | โ Relatively safe to proceed |
| ๐ก Medium Risk | 31-50% | Mixed signals, some concerns | โ ๏ธ Proceed with caution |
| ๐ High Risk | 51-70% | Multiple red flags detected | ๐จ High caution advised |
| ๐ด Very High Risk | 71-100% | Strong scam indicators | โ Avoid completely |
๐ Complete Feature Analysis
30+ data points analyzed across four key categories
Market Intelligence
9 features
- โข total_volume, floor_price
- โข average_price, market_cap
- โข volume_per_owner
- โข market_efficiency
- โข price_premium
- โข avg_daily_volume
- โข liquidity_indicator
Collection Properties
8 features
- โข is_verified, safelist_status
- โข has_discord, has_twitter
- โข trait_offers_enabled
- โข collection_offers_enabled
- โข total_supply, num_owners
Social Intelligence
6 features
- โข reddit_mentions
- โข reddit_engagement
- โข social_score
- โข reddit_sentiment
- โข scam_keyword_density
- โข hype_indicator
Blockchain Forensics
7 features
- โข creator_wallet_age_days
- โข creator_transaction_count
- โข wash_trading_score
- โข suspicious_patterns
- โข mint_distribution_score
- โข whale_concentration
- โข creator_balance_eth
๐ ๏ธ Technical Implementation Stack
Core Dependencies
# Machine Learning & Data Processing numpy, pandas, scikit-learn, joblib # API & Web Functionality requests, python-dotenv # Natural Language Processing nltk, vaderSentiment, textblob # Data Visualization matplotlib, seaborn # Date Handling python-dateutil, pytz
External APIs
OpenSea API
Collection marketplace data
Reddit API
Social sentiment analysis (OAuth 2.0)
Etherscan API
Ethereum blockchain data
๐ Model Performance Metrics
๐ Logistic Regression is by far the most optimal!
| Model | Key Strengths | Use Case |
|---|---|---|
| ๐ Logistic Regression | Interpretable, fast, linear separability | Primary classifier for NFT authenticity |
| ๐ณ Random Forest | Feature importance, non-linear patterns | Complex interaction detection |
| ๐ Gradient Boosting | Sequential improvement, weak signal boosting | Subtle scam pattern recognition |
| ๐ฏ SVM | Maximum margin, high-dimensional separation | Precise decision boundaries |
NFTruth by the Numbers
Making the NFT space safer with AI-powered analysis
๐ Future Enhancement Roadmap
Real-time Monitoring
Live collection tracking dashboard with alerts
Enhanced Web Interface
Advanced analytics and visualization tools
Community Reporting
Crowdsourced scam detection system
โ๏ธ Important Disclaimers
Always conduct your own research (DYOR) before making any financial decisions ๐ง
Ready to Analyze NFT Collections? ๐ก๏ธ
Built with โค๏ธ to make the NFT space safer for everyone. Start your AI-powered fraud detection analysis today!