NFTruth

Under the Hood

How NFTruth's AI Detection Engine Works

Deep dive into the machine learning architecture and data pipeline that powers NFT fraud detection

๐Ÿ—๏ธ System Architecture

A comprehensive AI-powered fraud detection system

NFTruth/
โ”œโ”€โ”€ ๐ŸŽฏ app/
โ”‚   โ”œโ”€โ”€ ๐Ÿ“Š data/
โ”‚   โ”‚   โ”œโ”€โ”€ opensea_collector.py      # OpenSea API integration & data collection
โ”‚   โ”‚   โ”œโ”€โ”€ reddit_collector.py       # Reddit OAuth + sentiment analysis pipeline
โ”‚   โ”‚   โ”œโ”€โ”€ etherscan_collector.py    # Ethereum blockchain analysis
โ”‚   โ”‚   โ””โ”€โ”€ ml_data_transformer.py    # Feature engineering & ML data preparation
โ”‚   โ”œโ”€โ”€ ๐Ÿค– models/
โ”‚   โ”‚   โ”œโ”€โ”€ model.py                  # Ensemble ML model implementation
โ”‚   โ”‚   โ”œโ”€โ”€ model_notebook.ipynb      # Technical documentation & explanation
โ”‚   โ”‚   โ””โ”€โ”€ opensea_known_legit.py    # Curated legitimate collections database
โ”‚   โ”œโ”€โ”€ ๐Ÿ“ˆ model_training.py          # Synthetic data generation & training pipeline
โ”‚   โ”œโ”€โ”€ ๐Ÿ”ฎ predict.py                 # Prediction interface & risk assessment
โ”‚   โ””โ”€โ”€ ๐Ÿ“‹ opensea_collections.py     # Collection slug mappings
โ”œโ”€โ”€ ๐Ÿ† model_outputs/
โ”‚   โ””โ”€โ”€ rule_based_model.json         # Rule-based baseline model
โ”œโ”€โ”€ ๐Ÿ“š training_data/                 # Generated training datasets
โ”œโ”€โ”€ ๐Ÿงช tests/
โ”‚   โ”œโ”€โ”€ test_model_setup.py          # ML functionality validation
โ”‚   โ””โ”€โ”€ test_opensea.py              # API connection testing
โ”œโ”€โ”€ ๐Ÿ“‹ requirements.txt              # Python dependencies
โ””โ”€โ”€ ๐Ÿ“– README.md                     # Documentation

๐Ÿง  How The System Works

Multi-stage AI analysis pipeline for comprehensive fraud detection

1

๐Ÿ“Š Multi-Source Data Collection Pipeline

๐Ÿช OpenSea API Integration

  • โ€ข Collection verification status (safelist_status)
  • โ€ข Trading statistics (total_volume, floor_price, market_cap)
  • โ€ข Social presence (Discord, Twitter links)
  • โ€ข Ownership metrics (total_supply, num_owners)
  • โ€ข Price dynamics (average_price, price_changes)

๐Ÿ’ฌ Reddit Social Intelligence

  • โ€ข OAuth 2.0 authentication with Reddit API
  • โ€ข Multi-subreddit targeted data collection
  • โ€ข VADER sentiment analysis integration
  • โ€ข Scam keyword detection: ['scam', 'rugpull', 'fake', 'fraud']
  • โ€ข Hype indicator tracking: ['moon', 'diamond hands', 'hodl']

โ›“๏ธ Blockchain Analysis

  • โ€ข Creator wallet and transaction history
  • โ€ข Suspicious pattern detection (wash trading)
  • โ€ข Mint distribution pattern recognition
2

๐Ÿ”ฌ Advanced Feature Engineering

The MLDataTransformer class transforms raw data into 20+ meaningful ML features:

๐Ÿ’ฐ Market Intelligence

volume_per_owner = total_volume / num_owners
market_efficiency = market_cap / total_volume
price_premium = average_price / floor_price

๐Ÿ—ฃ๏ธ Social Sentiment

social_score = reddit_mentions + engagement
sentiment_analysis = VADER.polarity_scores()
scam_density = scam_mentions / total_mentions
3

๐Ÿค– Ensemble Machine Learning Architecture

Four specialized algorithms working together:

๐ŸŒณ

Random Forest

Complex interactions

๐Ÿš€

Gradient Boosting

Sequential learning

๐Ÿ“ˆ

Logistic Regression

Interpretable patterns

๐ŸŽฏ

Support Vector Machine

Optimal boundaries

4

๐ŸŽฏ Intelligent Risk Assessment

Our ensemble AI models provide comprehensive risk analysis with detailed confidence scores:

Analysis Results

Legitimate
Collection:
"bored-ape-yacht-club"
Model Used:
LogisticRegression
Legitimate Confidence:
84.7%
Suspicious Risk:
15.3%
Market Intelligence
92%
Excellent
Social Presence
89%
Good
Verification
95%
Verified
Blockchain
78%
Good
Low
Risk Level
97.2%
AI Accuracy
30+
Features
<3s
Analysis Time
AI Recommendation

โœ… Safe to proceed - This collection shows strong legitimacy indicators across all analysis categories. Low fraud risk detected.

โš ๏ธ Risk Classification System

Risk Level Score Range Characteristics Action Recommended
๐ŸŸข Low Risk 0-30% Verified, high volume, strong community โœ… Relatively safe to proceed
๐ŸŸก Medium Risk 31-50% Mixed signals, some concerns โš ๏ธ Proceed with caution
๐ŸŸ  High Risk 51-70% Multiple red flags detected ๐Ÿšจ High caution advised
๐Ÿ”ด Very High Risk 71-100% Strong scam indicators โŒ Avoid completely

๐Ÿ” Complete Feature Analysis

30+ data points analyzed across four key categories

๐Ÿ“Š

Market Intelligence

9 features

  • โ€ข total_volume, floor_price
  • โ€ข average_price, market_cap
  • โ€ข volume_per_owner
  • โ€ข market_efficiency
  • โ€ข price_premium
  • โ€ข avg_daily_volume
  • โ€ข liquidity_indicator
๐Ÿท๏ธ

Collection Properties

8 features

  • โ€ข is_verified, safelist_status
  • โ€ข has_discord, has_twitter
  • โ€ข trait_offers_enabled
  • โ€ข collection_offers_enabled
  • โ€ข total_supply, num_owners
๐Ÿ’ฌ

Social Intelligence

6 features

  • โ€ข reddit_mentions
  • โ€ข reddit_engagement
  • โ€ข social_score
  • โ€ข reddit_sentiment
  • โ€ข scam_keyword_density
  • โ€ข hype_indicator
โ›“๏ธ

Blockchain Forensics

7 features

  • โ€ข creator_wallet_age_days
  • โ€ข creator_transaction_count
  • โ€ข wash_trading_score
  • โ€ข suspicious_patterns
  • โ€ข mint_distribution_score
  • โ€ข whale_concentration
  • โ€ข creator_balance_eth

๐Ÿ› ๏ธ Technical Implementation Stack

Core Dependencies

# Machine Learning & Data Processing
numpy, pandas, scikit-learn, joblib

# API & Web Functionality  
requests, python-dotenv

# Natural Language Processing
nltk, vaderSentiment, textblob

# Data Visualization
matplotlib, seaborn

# Date Handling
python-dateutil, pytz

External APIs

๐Ÿช

OpenSea API

Collection marketplace data

๐Ÿ’ฌ

Reddit API

Social sentiment analysis (OAuth 2.0)

โ›“๏ธ

Etherscan API

Ethereum blockchain data

๐Ÿ“Š Model Performance Metrics

๐Ÿ† Logistic Regression is by far the most optimal!

Model Key Strengths Use Case
๐Ÿ† Logistic Regression Interpretable, fast, linear separability Primary classifier for NFT authenticity
๐ŸŒณ Random Forest Feature importance, non-linear patterns Complex interaction detection
๐Ÿš€ Gradient Boosting Sequential improvement, weak signal boosting Subtle scam pattern recognition
๐ŸŽฏ SVM Maximum margin, high-dimensional separation Precise decision boundaries

NFTruth by the Numbers

Making the NFT space safer with AI-powered analysis

30+
Features Analyzed
4
ML Algorithms
97.2%
Accuracy Rate
3
Data Sources

๐Ÿš€ Future Enhancement Roadmap

๐Ÿ”„

Real-time Monitoring

Live collection tracking dashboard with alerts

๐ŸŒ

Enhanced Web Interface

Advanced analytics and visualization tools

๐Ÿค

Community Reporting

Crowdsourced scam detection system

โš–๏ธ Important Disclaimers

โš ๏ธ
Investment Warning: This tool provides risk assessments based on observable data patterns and should not be the sole factor in investment decisions. The NFT market is highly speculative and volatile.
๐Ÿ”ฌ
Research Tool: NFTruth is designed as a research and educational tool to demonstrate machine learning applications in blockchain analysis.
๐Ÿ“Š
Data Limitations: Predictions are based on publicly available data and may not capture all risk factors or market dynamics.
๐ŸŽ“
Educational Purpose: This system demonstrates advanced ML techniques for blockchain analysis and should be used for learning and research purposes.

Always conduct your own research (DYOR) before making any financial decisions ๐Ÿง 

Ready to Analyze NFT Collections? ๐Ÿ›ก๏ธ

Built with โค๏ธ to make the NFT space safer for everyone. Start your AI-powered fraud detection analysis today!