How to find empty columns in your db by makaroni4
So I have quite old #rails app with #mysql database. From the first look at the db schema I understood that there are quite a lot of useless columns so I wrote a simple #ruby script which uses #activerecord to check all the empty columns.
require 'active_record'
def query table, column
"select #{table}.#{column.name} from #{table} where #{table}.#{column.name} is not null or #{table}.#{column.name} != '' limit 1"
end
ActiveRecord::Base.establish_connection(
adapter: 'mysql2',
database: db_name,
host: 'localhost',
username: username,
pool: 5
)
connection = ActiveRecord::Base.connection
connection.tables.each do |table|
connection.columns(table).each do |column|
next if column.name == 'id'
if ActiveRecord::Base.connection.execute(query table, column).to_a.empty?
puts "#{table} | #{column.name}"
end
end
end
As you can see it just uses two ActiveRecord methods - tables and columns.
Comments
releu commented 7 months ago
Why are you using
select distinctand checking all rows? Use. Something like that
makaroni4 commented 7 months ago
@releu totally agree, have fixed that
makaroni4 commented 7 months ago
Now it works amazingly fast.
releu commented 7 months ago
U r welcome :)
apeiros commented 7 months ago
You can use an
existsquery instead of limit.divineforest commented 7 months ago
#!/usr/bin/env ruby
at the top :)
divineforest commented 7 months ago
#{table}.#{column.name} != '' this will not work for integer column. At least in PG
divineforest commented 7 months ago
DRY:
ActiveRecord::Base.establish_connection( YAML.load(File.open("config/database.yml"))["development"] )
releu commented 7 months ago
maybe
table.column::string != ''?bastilian commented 7 months ago
i often see in our DB a field that we, at some point stopped using for some reason, forgot to delete it or think we wanna keep the data, and the column is filled with a few entries, but most of it is NULL or ''. maybe instead of looking for completely empty columns why not get the amount of usage. also some are meant to be NULL...
makaroni4 commented 7 months ago
@bastilian agree, great idea to display percentage of empty columns and sort by them by this number. Current version of script takes just couple of seconds to run and I think that computation of amount of usage will take time.
bastilian commented 7 months ago
@makaroni4 yes. computation time will increase definitely. hm... currently i can not think of a solution that would be faster than doing to selects with counts and calculate in ruby.