dewar

A storage place for samples of all kinds. Typically focused around large-scale collection of phishing kits and other email/website artifacts.

The basic plan at the moment is to treat each submitted archive as a “job”. There’s “known good” jobs and “other” jobs. “Known good” job lots would be something like the Wordpress installer, while “other” would be a backup of a compromised/phishing site. There’s likely lots of commonality between them, but the interesting parts are the differences.

Build Status

If you have any kind of suggestion or issue, please create a github issue - I’ll gladly discuss it. Pull requests for features or fixes are even better :)

Table of Contents

Usage

Starting the web interface:

pipenv install 
pipenv run python -m dewar web

Starting the ingestor (not … really working yet)

pipenv run python -m dewar ingestor

Internal “element” types

Random thoughts

Various bits to build

  1. ingestion methods:
    • watch a bucket
    1. “known_good” - automatically tagged as good
    2. “other” - known_good = False - [ ] have a simple API for submitting files, part of the frontend
  2. ingestion pipelines [ ] simple single threaded widget [ ] pubsub queue with multiple nodes doing things
  3. storage backends
    • s3
    • local filesystem
  4. metadata backends
    • tinydb
    • postgresql ? (not on my )
    • other?
  5. processing of samples
    • extraction of IOCs like urls, emails, IP addresses etc.
    • hilariously simple tokenization
    • image normalisation? (phistOfFury?)
    • ssdeep?
    • words/phrases etc
  6. processing pipelines
    • single job queue, processing tasks
    • pubsub multithreaded clustered hilarity
  7. Data interaction
    • website frontend for ..
    • seeing the incoming file bucket contents
    • manually processing incoming jobs - in case you want to insert notes as you do it etc
    • see the list of historical jobs
    • edit job data (typically only notes?)
    • upload jobs - [ ] HTTP API
    • shoving files into the job buckets
    • submitting jobs
    • querying job data?
    • querying hashes
      • have we seen this
      • extended - which jobs was this seen in, for correlation
  8. AAA…
    • is scary bizness
    • flask basic http on frontends

Starting a new backend implementation

An example would be a Storage backend. The “base” template is dewar.storage.Storage and Storage backends should always be imported as from dewar.storage.<backend> import Storage so they can be consistently used. The S3 implementation then is from dewar.storage.s3 import Storage.

Methods that storage backends should support (inspired by http verbs)

metadata backends should support