This document explains how to install the openaustralia parser, it assumes you’ve already installed the web application. The parser generates XML files which get loaded into the database of the web application. If all you need is the XML files and aren’t interested particularly in installing and configuring the parser then you can just download the XML files from data.openaustralia.org.
Install DarwinPorts and then install ImageMagick and ghostscript:
$ sudo port install ImageMagick $ sudo port install ghostscript
Note: the previous step takes a long while to complete, make yourself a coffee (or two)
Install the required rubygems:
$ sudo gem install -y mechanize -v 0.9.2 $ sudo gem install -y builder rmagick rcov htmlentities rspec activesupport log4r hpricot
Note: Currently OpenAustralia requires an older version of mechanize (0.9.2), but this might change in the future.
Use apt-get to install the requirements:
$ sudo apt-get install imagemagick libmagick9-dev ghostscript ruby rubygems ruby1.8-dev libxslt1-dev
Install the required rubygems:
$ sudo gem install mechanize -v 0.9.2 $ sudo gem install activesupport -v 2.3.7 $ sudo gem install builder rmagick rcov htmlentities rspec log4r hpricot json
Note: Currently OpenAustralia requires an older version of mechanize (0.9.2), but this might change in the future.
Ruby has its own Windows versions that you need to get from RubyInstaller for Windows (choose the one-click installer option).
In addition to the Ruby gems required above you’ll need to install Ruby-MySQL, which can be downloaded from http://www.tmtm.org/en/ruby/mysql/.
There is an error connecting to MySQL 5.0+ with Ruby due to change in password hash length so you will have to recompile with the include path set to the new MySQL C library.
Copy the example configuration:
$ cd openaustralia-parser $ cp configuration.yml.example configuration.yml
The only configuration necessary is to change the web-root if you have installed the web application in another location. That value is web_root
in the file you just copied: configuration.yml
Before you can run the parser, you will need to create the directories that will hold the images of the MPs.
$ mkdir -p pwdata/images/mps pwdata/images/mpsL
You are now ready to create the members information. You should just use:
$ ./parse-members.rb # you should see messages on the console similar to the following Reading members data... Running consistency checks... Writing XML... Replacing existing member with new data for 5 This is for your information only, just check it looks OK. $VAR1 = [ '5', '10006', 1, '', 'Albert', 'Adermann', 'Fisher', 'National Party', '1972-12-02', '1984-12-01', 'general_election', 'elected_elsewhere' ]; [...]
To download the members images (this will take a while):
$ ./member-images.rb
If you want, though it is not particularly important initially, you can also download the links information (which goes on the Representatives’ and Senators’ pages) by running:
$ ./parse-member-links.rb
To download the Hansard data (the speeches) for one day, say Sept 20th, 2007 and load them into the database:
$ ./parse-speeches.rb 2007.09.20 parse-speeche: 100% |oooooooooooooooooooooooooooooooooooooooooo| Time: 00:01:27 db loading 2007-09-20 db loading 2007-09-20
You should now be able to view the results at your webserver URL, dev.openaustralia.org
You should now see a version of openaustralia.org populated with data.
Congratulations, you’ve got a mostly complete running version of OpenAustralia! Give yourself a big pat on the back.