I recently released a gem called forceps. It lets you copy data from remote databases using Active Record. It addresses a problem I have found many times: importing data selectively from production databases into your local database to play with it safely. In this post, I would like to describe how the library works internally. You can check its usage on the README.

The idea

Active Record lets you change the database connection on a per model basis using the method .establish_connection. Forceps takes each child of ActiveRecord::Base and generates a child class with the same name in the namespace Forceps::Remote. These remote classes also include a method #copy_to_local that copy the record and all the associated models automatically.

The main reason for managing remote Active Record classes is that I wanted to use its reflection and querying support for discovering associations and attributes. A nice side effect is that the library lets you explore remote databases in your local scripts with ease.

Defining remote classes and remote associations

The definition of the child model classes with the remote connection is shown below:

def declare_remote_model_class(klass)
  class_name = remote_class_name_for(klass.name)
  new_class = build_new_remote_class(klass, class_name)
  Forceps::Remote.const_set(class_name, new_class)
  remote_class_for(class_name).establish_connection 'remote'
end

def build_new_remote_class(local_class, class_name)
    Class.new(local_class) do
        ...
        include Forceps::ActsAsCopyableModel
        ...
    end
  end
end

With this definition, remote classes let you manipulate isolated remote objects. But the inherited associations are still pointing to their local counterparts. I solved this problem by cloning the association and changing its internal class attribute to make it point to the proper remote class.

def reference_remote_class_in_normal_association(association, remote_model_class)
  related_remote_class = remote_class_for(association.klass.name)

  cloned_association = association.dup
  cloned_association.instance_variable_set("@klass", related_remote_class)

  cloned_reflections = remote_model_class.reflections.dup
  cloned_reflections[cloned_association.name.to_sym] = cloned_association
  remote_model_class.reflections = cloned_reflections
end

Cloning trees of active record models

For copying simple attributes, I ended up invoking each setter directly. I intended to do it with mass assignment but disabling its protection in Rails 3 is pretty tricky, as it can be enabled in multiple ways. Rails 4 moved mass-assignment protection to the controllers, but I wanted forceps to support both versions.

def copy_attributes(target_object, attributes_map)
    attributes_map.each do |attribute_name, attribute_value|
      target_object.send("#{attribute_name}=", attribute_value)
    end
end

Cloning associations is done by fetching all the possible associations of each model class with .reflect_on_all_associations, and

just copying the associated objects depending on its cardinality. For example: this method copies a has_many association:

def copy_associated_objects_in_has_many(local_object, remote_object, association_name)
  remote_object.send(association_name).find_each do |remote_associated_object|
    local_object.send(association_name) << copy(remote_associated_object)
  end
end

It uses a cache internally to avoid copying objects more than once.

Handling STI and polymorphic associations

Supporting Single Table Inheritance and polymorphic associations turned out to be one of the most challenging parts. Both features rely on a type column containing the model class to instantiate. This column is referenced in multiples places in the Rails codebase, such as in join queries or when instantiating records.

For example, when instantiating objects from queries Rails uses the hash of attributes obtained from the database. In order to change the type column that method is overridden in remote classes:

Class.new(local_class) do
    ...

if Rails::VERSION::MAJOR >= 4
    def self.instantiate(record, column_types = {})
        __make_sti_column_point_to_forceps_remote_class(record)
        super
    end
else
    def self.instantiate(record)
    __make_sti_column_point_to_forceps_remote_class(record)
    super
    end
end

def self.__make_sti_column_point_to_forceps_remote_class(record)
  if record[inheritance_column].present?
    record[inheritance_column] = "Forceps::Remote::#{record[inheritance_column]}"
  end
end

...
end

Testing against multiple Rails versions

Testing against multiple Rails versions was far easier than I expected. I used this approach by Richard Schneeman: using an environment variable to configure the Rails version at the .gemspec file:

if ENV['RAILS_VERSION']
    s.add_dependency "rails", "~> #{ENV['RAILS_VERSION']}"
else
    s.add_dependency "rails", "> 3.2.0"
end

And

set the target versions in travis.yml:

env:
  - "RAILS_VERSION=3.2.16"
  - "RAILS_VERSION=4.0.2"

The awesomeness of travis will do the rest.

Conclusions

A thing I loved about this project is that I started with a very simple idea without knowing if it was going to work with real-life complex models. I just wrote a very simple test and handled more and more cases incrementally. It ended up being more complex than I expected but it is still a pretty compact library thanks to the wonders of Ruby, metaprogramming and Active Record.

The code for Forceps is available at Github. Pull requests are welcomed.