How to find empty columns in your db by makaroni4

So I have quite old #rails app with #mysql database. From the first look at the db schema I understood that there are quite a lot of useless columns so I wrote a simple #ruby script which uses #activerecord to check all the empty columns.

require 'active_record'

def query table, column
  "select #{table}.#{column.name} from #{table} where #{table}.#{column.name} is not null or #{table}.#{column.name} != '' limit 1"
end

ActiveRecord::Base.establish_connection(
  adapter: 'mysql2',
  database: db_name,
  host: 'localhost',
  username: username,
  pool: 5
)

connection = ActiveRecord::Base.connection

connection.tables.each do |table|
  connection.columns(table).each do |column|
    next if column.name == 'id'

    if ActiveRecord::Base.connection.execute(query table, column).to_a.empty?
      puts "#{table} | #{column.name}"
    end
  end
end

As you can see it just uses two ActiveRecord methods - tables and columns.

Similar posts

Comments

releu commented 7 months ago

757fb0d5ec7560b6f25f5bd98eadc020?size=52

Why are you using select distinct and checking all rows? Use

select table.foo from table where table.foo is not null and table.foo != '""' limit 1

. Something like that

makaroni4 commented 7 months ago

E302c3320cd14b02cbe237b479d7f884?size=52

Now it works amazingly fast.

releu commented 7 months ago

757fb0d5ec7560b6f25f5bd98eadc020?size=52

U r welcome :)

apeiros commented 7 months ago

33e47178a56073c9b4bff0d2b9b3cb70?size=52

You can use an exists query instead of limit.

divineforest commented 7 months ago

Be91a535ad5acb3bb34f16401d850658?size=52

#!/usr/bin/env ruby

at the top :)

divineforest commented 7 months ago

Be91a535ad5acb3bb34f16401d850658?size=52

#{table}.#{column.name} != '' this will not work for integer column. At least in PG

divineforest commented 7 months ago

Be91a535ad5acb3bb34f16401d850658?size=52

DRY:

ActiveRecord::Base.establish_connection( YAML.load(File.open("config/database.yml"))["development"] )

releu commented 7 months ago

757fb0d5ec7560b6f25f5bd98eadc020?size=52

maybe table.column::string != '' ?

bastilian commented 7 months ago

487b7ae6c8286c3062ed7ee174410865?size=52

i often see in our DB a field that we, at some point stopped using for some reason, forgot to delete it or think we wanna keep the data, and the column is filled with a few entries, but most of it is NULL or ''. maybe instead of looking for completely empty columns why not get the amount of usage. also some are meant to be NULL...

makaroni4 commented 7 months ago

E302c3320cd14b02cbe237b479d7f884?size=52

@bastilian agree, great idea to display percentage of empty columns and sort by them by this number. Current version of script takes just couple of seconds to run and I think that computation of amount of usage will take time.

bastilian commented 7 months ago

487b7ae6c8286c3062ed7ee174410865?size=52

@makaroni4 yes. computation time will increase definitely. hm... currently i can not think of a solution that would be faster than doing to selects with counts and calculate in ruby.

apeiros
bastilian
divineforest
makaroni4
releu