Reddit source validation for content sourcing
When agents mine Reddit for victim stories, scam reports, tips, or product feedback, VeracityAPI can flag suspiciously generic or weak-provenance posts before they become source material.
Business value
- Protects content integrity before bad sources enter the research corpus.
- Reduces risk from AI-planted marketing, competitor sabotage, astroturfing, or generic reposts.
- Helps research agents prioritize firsthand accounts with concrete details.
Agent job to be done
Act as a source triage filter. Score candidate posts/comments, keep credible-specific sources, and send suspicious ones to verification or discard queues.
format: social_postintended_use: citedomain: Reddit source validation / travel scams
When to call VeracityAPI
Run after Reddit search/scrape and deduplication, before summarization or content generation.
What text to submit
Post title, body, selected comments, subreddit, timestamp, author age/karma if available, and permalink. VeracityAPI scores text; your agent should keep metadata separately.
Decision policy
- allow: low risk with concrete first-person details, timestamps, places, sequence of events, and no obvious promotion.
- human_review: medium risk, missing provenance, or source is important to a claim.
- reject: high risk for citation/training workflows, especially if evidence flags generic phrasing or unsupported claims.
- local override: never cite a Reddit post as fact without corroboration; use it as lead/source material.
Request template
curl https://api.veracityapi.com/v1/analyze -H "Authorization: Bearer DOC_KEY" -H "Content-Type: application/json" -d '{"type":"text","content":"Paste content here","context":{"format":"article","intended_use":"publish"}}'Automation recipe
- Research agent collects candidate Reddit posts.
- Deduplicate near-identical stories.
- Score each candidate with intended_use=cite.
- Rank by content_trust_score and evidence quality.
- Send top sources to corroboration search; discard or quarantine high-risk posts.
Evidence spans agents should inspect
- story lacks place/time/sequence details
- generic warning language without lived experience
- promotional phrasing
- claims that read like summarized advice rather than firsthand report
Policy pseudocode
if (result.recommended_action === "allow") continueWorkflow(); if (result.recommended_action === "revise") rewriteWith(result.evidence, result.recommended_fixes); if (result.recommended_action === "human_review") queueForHumanReview(result); if (result.recommended_action === "reject") discardOrRebuild();
KPIs to track
- percentage of scraped posts filtered
- corroboration success rate
- human reviewer acceptance rate
- bad-source incidents prevented
- research time saved
What can go wrong
- VeracityAPI cannot prove a Reddit user is real.
- A truthful victim may write vaguely; do not reject high-impact sources without corroboration review.
- Combine with account metadata, cross-source search, and moderator signals.
Cost and latency notes
Analyze only is $0.005 per 1,000 characters; Analyze + revise with auto_revise=true is $0.010 per 1,000 characters. Both round up to the nearest 1,000 characters. Short captions/emails usually cost $0.005; longer pages or chapters scale linearly by length. Current v0.1 latency is LLM-bound, so batch/concurrent orchestration is recommended for high-volume pipelines.
Agent evaluation checklist
- Does this workflow have a costly failure mode from generic or weak-provenance text?
- Can the agent map evidence spans back to editable source locations?
- Should this workflow fail open, fail closed, or queue human review if VeracityAPI is unavailable?
- Which field drives policy: recommended_action, risk_level, content_trust_score, specificity_risk, or provenance_weakness?
- What local rule should complement the API score?