🚜📦✨ forklift
tl;dr
🐍 command line tool for keeping fgdb's in sync with their various sources
arcgis server has services
services point at file geodatabases
the problem: synchronize updates from the source
windows task scheduler and arcpy
start out with one task
truncate and append new data
it's slow
but everything is good
start developing more apps
add more tasks
start guessing when to run each task based on how long the other one usually takes
start running into problems.
tasks overlap or fail silently
now what?
replication
kinda complicated and requires versioning and global ids
we are not setup for that
let's make yet another task scheduler
to organize and orchestrate all of our tasks into one process
we create pallets
🚜 lifts, processes, and ships pallets
pallets are a unit of work usually for a project
pallets consist of crates
crates are individual datasets
class Pallet(object):
  def build(self, configuration='Production'):
      '''Invoked before process and ship.'''
  def process(self):
      '''Invoked if any crates have data updates.'''
  def ship(self):
      '''Invoked whether the crates have updates or not.'''
  def post_copy_process(self):
      '''Invoked after data has been copied only
         if any crates have data updates.'''
class Crate(object):
  #: the name of the source data table
  self.source_name
  #: the name of the source database
  self.source_workspace
  #: the name of the destination database
  self.destination_workspace
  #: the name of the output data table
  self.destination_name
{
  'configuration': 'Production',
  'warehouse': 'file system location',
  'repositories': ['github repositories containing pallets'],
  'copyDestinations': ['the place map services point for data'],
  'stagingDestination': 'temp file system location',
  'sendEmails': True,
  'notify': ['sgourley@utah.gov']
}
the 🚜 lifecycle
windows task scheduler starts it all
The first rule of 🚜 is it does not work on any sabbath.
checkout/update all of the repositories from the config
find all of the pallets in the repositories
find all of the crates in the pallets
remove all of the duplicate crates
check the source data for changes for every crate
update/create data in staging location for every crate
stop all arcgis services dependent on updated crates
make a copy of production data
copy updated staging data
remove copy of production data
start arcgis services
execute post copy process method on all updated pallets
execute ship method on all pallets
send an email with the results
🎆 done 🎆

🚜

github.com/agrc/forklift

Ask🖐 Questions 🙋
Steve Gourley / @steveAGRC / #steveoh / AGRC

🚜

github.com/agrc/forklift

https://github.com/steveoh/Presentations