(note - this article has been split into two parts, one to just give you an overview and one that actually dissects a small web application I have built that attempts to be compliant with GDPR).

Introduction

So first and foremost, here's a repository with a Rails web application I have built to help with understanding technical problems with GDPR:

https://github.com/ziptofaf/gdpr-rails

If you are not familiar with this framework then worry not as this article tries to explain my logic behind it, not just code.

Data export and retention

First thing we need to get started with GDPR is to actually know which of our datatabase tables / models hold personal information. Considering it's an application wide behaviour that spans onto all models then I have opted to bake it straight into ActiveRecord via ActiveSupport:Concern:

https://github.com/ziptofaf/gdpr-rails/blob/master/lib/gdpr_extension.rb

What we have in here are 5 simple methods that we can override on per model basis:

has_personal_information? - GDPR regulations do not apply to models that do not store personally identifiable information. We default it to false.

retention_period, can_expire? and outdated_records are connected - as said in the first part of this article you really should not be keeping data around forever without a good reason. So by setting can_expire? to true you will be able to know which records are good to go by now. We default it to non-expiring records and if they do expire then to 3 years.

export_personal_information_from_model is here to remind you that you will need to define these methods in specific models. It will throw an expection if you don't. If you visit https://github.com/ziptofaf/gdpr-rails/blob/master/app/models/user.rb then you can find a export_personal_information(user_id) method that will go through every model that has personal information and call this method from there.

Last but not the least, I have also created a simple task under https://github.com/ziptofaf/gdpr-rails/blob/master/lib/tasks/retention.rake

remove_expired_records - as the name suggests this one gets rid of outdated records from all models.

With this we have cleared our first hurdle when building a GDPR compliant app. It also showcases VERY important point - you need to know what and where you actually store. Which isn't a big problem for new projects but I have seen codebase which had few HUNDRED models and personal information flying all over them... in which case it can be QUITE bothersome to say the least.

Encryption and Right to be forgotten

Right to be forgotten is a huge point of GDPR. If a user asks you to delete their info you have to oblige (well, unless other laws make this request impossible - eg. you might need to keep their records for fiscal reasons). Doing this inside a live application is one thing but you also need to manage this within BACKUPS. Which is where insanity starts. Here are some solutions available:

  1. don't store backups older than 30 days (as that's how long you have to delete said data). This might be actually reasonable for certain companies. Requires least effort and likely no redesign of your infrastructure. However you still will need a list of users that wanted to be deleted in case you need to restore your backup.
  2. recursively go through your backups and remove all information you have on a given person. Pros: none. Cons: yes.
  3. We can create a separate list of user_ids that want to be forgotten, stored in a file/database. Back it up often on an offsite storage, in a different place than rest of your application. Upon having to recover a database you do so and iterate through your list to get rid of them once more. Pros: relatively easy to implement. Cons: this data is de facto not gone from your system, just not visible. Which might or might not fly under GDPR.
  4. Encrypt personal data of each user with a dedicated key (kept inside a separate database) using strong encryption. When they want to be gone – just get rid of that key. Data is still in the database but not accessible to anyone, not even you. Pros: O(1) performance while also dealing with all the backups. Cons: requires a separate database and rewriting some core Rails components.

I have opted for a solution #4 in this application. Which comes with a certain set of difficulties but should satisfy GDPR requirements completely.

We will be needing two features to make this work - our separate database and something to encrypt/decrypt our data. Fortunately since 5.0 Rails has pretty much built-in Redis support which is a great place for keeping an {:user_id => :encryption_key} structure. As for encryption/decryption - I will be using attr_encrypted gem since it's frequently updated.

Note – if you don’t like idea of using Redis for this (which is a somewhat valid concern, it’s not as reliable as relational database after all) then you can look into sharding (via gems like Octopus or Rails sharding) or multi database (eg. via multiverse). I chose Redis because it’s very popular, easy to drop into your application and least likely to be dropped anytime soon.

Note #2 - you might ALSO add DeletedUser model (which holds a single integer user_id). Meaning that if redis server died and had to be restored for any reason then you could verify that all users are indeed gone. Yes, this does mean there can be a short time window in which few users requesting their data to be deleted will be accessible once more but here's a catch - you have 30 days to reply to these requests. And there's no way in hell you would be using 30+ days old backups of a Redis database.

Most of our custom logic for encryption will come from a single module named Encryptable:

https://github.com/ziptofaf/gdpr-rails/blob/master/lib/modules/encryptable.rb

This mixin holds all methods needed for retrieval and creation of private_keys. It also alters a very annoying default behaviour of attr_encrypted which is throwing an exception when you feed it nil as a key which occurs if you have already obliged with right to be forgotten (and we probably still want code like User.all to work correctly):

no-key

Note - with key deleted our models by default also become immutable and will crash upon an attempt to save. You can override an encrypt method in a similar fashion to decrypt, just wrap them inside begin rescue block (catching ArgumentError).

To add a bit of extra security (as suggested by reddit user, thanks!) our key is a composite one. 224 bits are stored inside Redis, 32 bits will be stored inside our application however (you can find it in config/secrets.yml). This isn't a big deal (although it might be useful in case someone got themselves a copy of Redis database) but it also doesn't cost us anything.

That being said, per-row encryption unique for each user is problematic. For instance if you use Devise for your user registration/login then it's going to break spectacularly as emails ARE considered personally identifiable information meaning the should be encrypted. Ultimately it works but required quite a lot of custom code (then again it's still better than trying to roll your own authentication system):

https://github.com/ziptofaf/gdpr-rails/blob/master/app/models/user.rb

There's also a performance problem - if emails are encrypted then checking if it's unique (or even if it exists) is an O(n) operation. So you de facto have to do this:

def self.find_by_email(email)
  users = User.all
  users.each do |user|
    return user if user.email == email
  end
  return nil
end

Ouch! A potential solution would be to add an email_hash to your model and use that instead. In this sample project I have opted for being fine with registration being an O(n) lookup of an email but use a non-encrypted username field for everything else (HOWEVER there is also a separate branch, email-login, that offers exactly what the name suggests). Still, this problem will propagate to your other models as well as it de facto breaks all kinds of grouping/text searching on a database layer. So a better alternative would be to have this done straight in a database but unfortunately I am not familiar with good solutions in this regard.

On the plus side - if you look at https://github.com/ziptofaf/gdpr-rails/blob/master/app/models/mock_order.rb file then you will see that VERY little is needed to create a model with two custom fields with encryption and they work just as you would expect from a typical Rails application:
on-the-plus

User consents

Under GDPR you de facto can't have just "OK" boxes or pre-checked checkboxes. One checkbox users have to manually click for one kind of information you want to store on them.

However there's a bit more to it than that. Namely - what if your ToS changes? You must have a way of tracking to what version of specific consent your users agreed to.

I have opted for 3 separate models to accomplish it. ConsentCategory, UserConsent and Consent. ConsentCategory can be something like:

{"id":2,"name":"cookies","created_at":"2018-04-03T18:01:20.851Z", "updated_at":"2018-04-03T18:01:20.851Z","mandatory":true, "shortened_description":"I agree for use of cookies on this site to distinguish me from other users"}

Consents are basically "versions" of a given ConsentCategory. Those have a full detailed description (which might also be your changelog, markup, html etc).

Then there are UserConsents - aka what ConsentCategory did users agree to and when exactly it happened. This way you can always keep track on who agreed to which version of your ToS. Although if you need a more fine-grained control (eg. you need to know that user signed to version 3,4 and 7 of your adware campaign) then you could connect UserConsents to Consents directly rather than their categories.

When you create a new Consent it will mark also all UserConsent to requires_revalidation:true which should trigger some kind of an action in your application (eg. showing them a form to fill when they log in).

If you try using this little project you will see that these fields do show in the registration (in a bit hacky method but it works), you also have two disparate types on consents - mandatory or not (the latter does not need to be filled to continue, eg. something like agreeing for newsletter fits under this category).

Conclusions and extras

That's about it. One more important topic to explore would be auditing (you should hold information on which employees and when accessed and/or altered personal information) but we have plenty of good solutions in Rails environment already, just read documentation of audited gem to get started, there's no real point in me repeating it.

Also remember to prune your logs - what reaches database will indeed stay encrypted. But it does not necessarily apply to your logs. So look into your filter_parameters_logging.rb file and add necessary symbols to it, eg.

https://github.com/ziptofaf/gdpr-rails/blob/master/config/initializers/filter_parameter_logging.rb

Q: What about personal information that you HAVE to track? Like IP addresses?
A: Frankly, the fact that IP address can be considered a PII under some conditions is stupid but from what I was told - alone it isn't a problem, it only becomes one when bundled with other information. The fact you gather it should be covered in your terms of service and on your GDPR compliance document. But long story short - it's essential for your service to function and to protect against bad guys so you can store it implicitely, just as long as you don't do it indefinitely. But alas specifics of GDPR actually differ from country to country so DO NOT take my word for granted.

Q: Do I ALWAYS have to have explicit user consent for everything?
A: No. There is an "implicit" consent category. For instance if you have a web store then you don't need to tell users that you will be keeping their address information to finalize their order. It's considered obvious. If you wanted to keep it afterwards however then you should mention it and ask for permission.