Wednesday, November 25, 2009

SOAP Testing over HTTPS, using Siege instead of Ab

I've been using ab for a while now to perform load testing. Recently, I ran into trouble however because ab doesn't work with HTTPS out of the box (at least on my Ubuntu install). That triggered an investigation into other tools. Siege fit the bill nicely. Basically, I needed a way to perform an HTTP Post with a SOAP body.

On ubuntu, it is a simple "apt-get install siege".

Then, you create a siegerc file:

verbose = true
logging = true
protocol = HTTP/1.1
connection = keep-alive
concurrent = 50 #Number of concurrent requests
file = ./urls.txt
delay = 0 #If you are using for benchmarking value should be 0
benchmark = true


The settings are fairly obvious. Note that the file parameter points to an additional file you'll need, which lists out all of the urls you want included in your test. Below is an example:

https://foo.com/ws/Blah POST < soapMessage.xml


Notice after the url we specify "POST", which makes it a POST method instead of GET. Additionally, we pipe in the XML file which contains the SOAP message.

Then we can simply invoke siege with:

siege --rc=./siegerc --header="Authorization:Basic BASGHAJSG78236ds"


Notice, we are supplying an extra header for Basic Auth.

Tuesday, September 22, 2009

MySQL: LEFT Join with Constraints

I just fixed a bug in a colleague's select statement. And I think it makes a good example. So here it is:

You have two tables foo and bar. You are performing different types of analysis on the items in foo, and writing that analysis to bar. Foo has a primary key "id". And bar has three columns: "foo_id", "analysis" and "version". The bar table contains the following entries.

1, blah, 3
1, erg, 2
2, erg, 2

This means that item 1 in foo has been analyzed for blah and erg versions 3 and 2 respectively. Item 2 has been analyzed for only erg version 2.

Now, lets say there are three items in foo and you want a query to pull back all of the items that need analysis for blah version 3. The right answer is items 2, and 3.

You may start with:
select * from foo LEFT JOIN bar.foo_id = foo.id

This returns 4 rows:
1, blah, 3
1, erg, 2
2, erg, 2
3, null, null

From this resultant table there is no good way to select foo.id's that have not been processed by the right version of blah. However, if we add a join constraint to our select we can. The select becomes:
select * from foo LEFT JOIN bar.foo_id = foo.id AND bar.analysis = 'blah'


This returns a beautiful table:
1, blah, 3
2, null, null
3, null, null

Essentially the left join takes place with only a subset of bar that meets the join constraint.

NOTE: This is entirely different from the following:
select * from foo LEFT JOIN bar.foo_id = foo.id WHERE bar.analysis = 'blah'


That query will only return the single row. To manipulate the original join you would probably end up using a "NOT EXISTS IN" kind of construct. Yuck.

The JOIN with a constraint presents a nice clean simple solution that lets us figure out which foo items still need analysis.

Thursday, September 10, 2009

ERROR 1005 (HY000): Rails Migrations for Unsigned Integers


ERROR 1005 (HY000): Can't create table './*/#sql-3e29_1209.frm' (errno: 150)


If you are getting this when trying to add a foreign key constraint make sure the columns are of the same type. This includes attributes! Specifically pay attention to signed vs. unsigned columns. Just because two columns are integer (or int(10) or int(11)), you won't be able to create a foreign key constraint between them unless they are have the same sign.

Furthermore, in a Rails Migration, to get a signed/unsigned column use the following:

create_table :tblFoo, :primary_key => :UID do |t|
t.column 'Bar', 'int(10) unsigned', :null => false
t.timestamps
end

Wednesday, September 9, 2009

Access denied for user 'debian-sys-maint'@'localhost'

I wanted to get replication setup to do backups offline as recommended by this article. I did the backup and loaded it onto the slave. When I went to restart, I received:

ERROR 1045 (28000): Access denied for user 'debian-sys-maint'@'localhost'


It turns out I backed up the users table as well so it blew away the debian user. I found the fix here .

The trick is to reset the debian-sys-maint users password by looking in /etc/mysql/debian.cnf and performing the following with whatever password you find in there:


GRANT ALL PRIVILEGES ON *.* TO 'debian-sys-maint'@'localhost' IDENTIFIED BY '' WITH GRANT OPTION;

Tuesday, July 28, 2009

Global Subversion Ignore Settings (Used for Eclipse Project Files)

I hate having to run a global svn propset ignore command. Instead it is much easier to open up .subversion/config and edit the following in the file.

global-ignores = .project .target .classpath .settings *.o *.lo *.la #*# .*.rej *.rej .*~ *~ .#* .DS_Store


Notice, I added all of the eclipse files to the ignore statement. Once you update that file, that should be it. Next time you run svn status (or anything) it will take those ignore patterns into account.

Java Wordnet Library (JWNL) Jar file Repo Entry

Here is the maven pom entry for Java Wordnet Library (JWNL)

<dependency>
<groupId>net.didion</groupId>
<artifactId>jwnl</artifactId>
<version>1.4</version>
</dependency>

Monday, July 27, 2009

CSV (Comma Separated Values) Processing in Ruby

FasterCSV rocks. You can find it here:
http://fastercsv.rubyforge.org/

Start by installing it using the gem.

sudo gem install fastercsv


After that, you are all set. Just make sure you require rubygems first.

require 'rubygems'
require 'fastercsv'

i=0
FasterCSV.foreach('blogs.csv') do |row|
i=i+1
puts("#{row[2]}")
end

As you can see from above, the row is an array that contains the values from the CSV file.

ActiveRecord outside of Rails (even with ODBC)

There are three quick lines that you need in order to use ActiveRecord outside of Rails. First, you need to load gems, then you can load ActiveRecord. Then, you can pick and choose which of your models to use.


require 'rubygems'
require 'activerecord'
require @@RAILS_APP_HOME + '/app/models/foo.rb'


Then, you'll need this little snippet to establish the connection to the database for ActiveRecord:

require 'rubygems'
require 'activerecord'
require 'yaml'

@@DATABASE_CONFIGURATION = YAML::load(File.open(File.dirname(__FILE__) + '/config/databases.yml'))

def establish_connection(database)
dbconfig = @@DATABASE_CONFIGURATION
ActiveRecord::Base.establish_connection(dbconfig[database])
# ActiveRecord::Base.logger = Logger.new(STDERR)
if (dbconfig['mode'] == 'odbc')
puts("Connecting to [#{database}]: ODBC,"+
" DSN=#{dbconfig[database]['dsn']}/#{dbconfig[database]['adapter']}"+
" [user=#{dbconfig[database]['username']}]")
else
puts("Connecting to [#{database}]: #{dbconfig[database]['adapter']}, "+
"#{dbconfig[database]['database']}@#{dbconfig[database]['host']}"+
" [user=#{dbconfig[database]['username']}]")
end
end

def remove_connection
ActiveRecord::Base.remove_connection
end
~


I put the above snippet in a central ruby file, then require that file anywhere I need to use the ActiveRecord objects. After a call to establish_connection, you can start using any model you've imported. Note, you'll see a slightly different URL constructed for ODBC.

Wednesday, July 22, 2009

Hadoop: java.io.IOException: Type mismatch in key from map

We've been working with hadoop for a while now, and inevitably newbies run into this error the first time they go to create their own Hadoop job. If you are running into this error, it is most likely a mismatch between your Map and/or Reduce implementation and the job configuration.

Your Map implementation probably looks something like this:

public static class MapClass extends MapReduceBase
implements Mapper {
private Text word = new Text();
public void map(LongWritable key, Text value,
OutputCollector output,
Reporter reporter) throws IOException {
...
}


Now, your map and reduce phases can have different output types and that is what sometimes causes the problems. If your phases are producing different types, be sure to set those types in the JobConf. You do this as follows....

Then when configuring your job you need to declare the appropriate output classes.

// Set the outputs for the Map
conf.setMapOutputKeyClass(Text.class);
conf.setMapOutputValueClass(IntWritable.class);

// Set the outputs for the Job
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(ArrayWritable.class);


Hope that saves people some time.

Finding a java class in a Jar file (or a set of files)

Often classpath problems are hard to diagnose. Sometimes you pick up an errant class on the classpath that conflicts with a version of the class that you need. (Very evil people sometimes rip apart jars and package all of their dependent classes together in a single jar)

Any who, however it happens, it is sometimes necessary to get a list of all classes everywhere, in all jar files. Then you can search that list for duplicate instances of a class.

I can't tell you how many times I've used this trick, especially with the sometimes unclear world of what is packaged into the JDK, application server, and what is in the actual application.

Use this:

find . -name '*.jar' -exec unzip -l {} \; > all_classes.txt


That line finds all the jar files recursively from the current working directory, lists the contents of the archive, and pipes that output to a text file that can be searched.

handy voodoo.

Sunday, July 19, 2009

UnsatisfiedLinkError with Surefire (on Mac OS X)

Wow, this was a needle in a hay stack. I recently needed to use LinkGrammar on my Mac OS X, and I wanted to use it via the Java Native Interfaces (JNI). I had been using LinkGrammar on my linux (ubuntu) boxes for some time. So, I was no stranger to compiling it with the java support. I even tested the compilation and install with:

nm /usr/local/lib/liblink-grammar-java.a | grep Java


However, when I was running maven, I received an UnsatisfiedLinkError. After MUCH googling I found:
http://www.nabble.com/Trouble-with-Java-Native-Libraries-td20293666.html

Simply setting the system property in maven for surefire is not sufficient, because maven changes the system property at runtime, which is too late for the VM to link to the library. Thus you need to use the following in your build section of your pom:

<plugin>
<groupid>org.apache.maven.plugins</groupid>
<artifactid>maven-surefire-plugin</artifactid>
<configuration>
<forkmode>once</forkmode>
<workingdirectory>target</workingdirectory>
<argline>-Djava.library.path=/usr/local/lib</argline>
</configuration>
</plugin>


After that, the VM will link to your libraries and surefire and all dependent tests should be able to see the java interfaces and access the necessary libraries.

Thursday, July 16, 2009

Rails Error: no such file to load -- application

I recently deployed an application I was building to production and received the following error:

no such file to load -- application


It turns out that Rails changed the name of the application controller between versions. My production environment was 2.2.2 and my development environment was 2.3.2. Renaming the application_controller file fixed everything.

mv app/controllers/application_controller.rb app/controllers/application.rb

Rake Production (Specifying an Environment to Rake)

I always forget how to specify an environment for rake commands. So I thought I would capture it here because, as it turns out, it is a hard thing to google.

rake RAILS_ENV=production db:migrate

Exec format error (in cron)

If you are like me, you sometimes get lazy and forget to include the shell command at the top of your scripts. This usually isn't a problem, but in certain cases (where the shell isn't set, or doesn't exist yet) it will cause problems. So, even if your script is set to be executable (chmod +x), you'll receive an error like:

Exec format error


Just such a case manifests itself when using a script through cron (and run-parts). To remedy this problem put the following at the beginning of the script:

#!/bin/sh


That way, the system will know what to use when executing the script.

Wednesday, July 15, 2009

An Elegant Matching Algorithm In Ruby

Recently, I wanted to sit down a learn ruby (independent of rails), so I grabbed a fairly standard hacking problem and went to town on it. I now love ruby more than ever.

The Problem:
Given a set of people and a set of jobs, where each person may take a different amount of time to do each job, optimally match people to jobs (1:1) to minimize the amount of time it will take to complete all jobs.

Many people will recognize this as a standard matching problem for which the Hungarian Algorithm is a known solution. But for those that have implemented the Hungarian Algorithm (or seen implementations of it), you know well enough to steer clear. It is error prone, and a very specific algorithm. So, I sought to implement a more elegant (and generally applicable) solution using graphs.

I found this article over on topcoder describing max-flow algorithms and the beauty of such. I fell in love and decided that I needed to solve this with max-flow.

The formulation:
Lets convert our problem to a bipartite graph. Let one set of nodes be the people, and the other set of nodes be the jobs. Create an edge from each person to each job, with a weight (NOT capacity!) equal to the time it will take that person to do that job.

In our situation, the capacity for each line is one since only one person can do a job. Flow is ofcourse initialized to zero for each edge (no one is doing any of the jobs). Lastly, we connect every job to a SINK node in the graph.

The philosophy and algorithm: (the important part)
Essentially, we'll be finding shortest paths in the graph, from each person to the SINK (via jobs) iterating through each of the people. On each iteration, we augment the graph with the path.

To recap the topcoder article, augmenting the graph consists of incrementing the flow for each edge in the augmenting path and adjusting edges to represent the new flow/capacity. There is an edge that represents "residual capacity" with capacity == capacity - flow, and there is an edge in the reverse direction that represents "upstream flow" with capacity == flow.

REMEMBER, in our case:
Capacity is ALWAYS 1.
Flow is either 1 or 0.
This is ENTIRELY independent of the cost/value/weight of an edge.

What does that mean to us you say? Well, in our case there are only two situations, a person is assigned to a job (flow == 1), or a person is not assigned ot a job (flow == 0). In the first case, where a person is assigned a job, there is an edge from the job to the person. During the algorithm, this edge essentially represents the path to UNDO the assignment. In the second case, the edge simply represents the
making that assignment.

String Array Initialization in Java

Just for all those out there that also find string array initialization in Java counter intuitive (especially if you are also a ruby enthusiast)...

To initialize an array in Java, the syntax is

String[] foo = {"bar", "serf", "doodle"};

The key is the curlies. =)

initializationError0 in JUNit

If you end up with a initializationError0 error coming out of JUnit, it is because the JUnit engine can't invoke your test method. Most likely this is because the method you annotated as a test takes a parameter. Simply remove the parameter, and you should be all set.

Monday, July 13, 2009

RSync for Backup over SSH using Different Port Number and Bandwidth Limit

Over the years, I've come to love rsync for offsite backups. It is incredibly flexible and can run over SSH. Here is the most flexible one-line backup you'll ever see:


rsync --bwlimit=100 --partial --progress --size-only -av "/Volumes/Shed/stuff/" --rsh='ssh -p 2828' "foo@offsitebackup.com:/home/stuff/"


This backups my local stuff (in /Volumes/Shed/stuff) and puts it in /home/stuff on offsitebackup.com. It also keeps partially transferred files (--partial) and shows progress (--progress). When comparing two files, it only considers the sizes (--size-only). I do this because dates could be different. Furthermore, it transfers using ssh, but over a different port (2828 in this case). Finally, I limit the bandwidth that the rsync consumers (--bwlimit) to 100 kb/s.

Very handy.

Wednesday, July 1, 2009

JDBC to MySQL Datetime (Time truncated)

Be careful when accessing Datetime fields in a MySQL database through JDBC. You might think that the java.sql.Date would work, but that actually truncates the Date back to midnight. In order to access the actual time, use getTimestamp() instead. Here is the code:

Result rs = ...
while (rs.next()) {
return rs.getTimestamp(1).getTime();
}

Sorting list of FIles in Java (returned from listFiles)

Here is some handy code to get files in order...

File[] files = dir.listFiles();
Arrays.sort(files, new Comparator() {
public int compare(File f1, File f2) {
return f1.getName().compareTo(f2.getName());
}
});

Tuesday, June 30, 2009

Vim syntax highlighting on Ubuntu

To get vim syntax highlighting:

First, make sure you have vim installed in addition to plain old vi.

sudo apt-get install vim


Then, create a vimrc dot file.

vim ~/.vimrc


In that file, add the following:

:syntax on


Thats it! Next time you edit a file in vim, you should have syntax highlighting.

Wednesday, June 24, 2009

Pidgin (and Adium) not connecting to Yahoo

OK, I entered a phase where my yahoo login started failing in Pidgin with Ubuntu. I first went in through my.yahoo.com and reactivated my account. It turns out that yahoo turns off your account if you don't login for a few years. =)

After that, it was a problem with the underlying libraries underneath both Pidgin and Adium. On Mac, I updated to the latest version of Adium (1.3.5) and that fixed it.

On ubuntu, it turns out that ubuntu won't update to new versions unless their is a security problem. In order to get functional enhancements, you need to upgrade using pidgin's instructions here:
http://www.pidgin.im/download/ubuntu/

After you enter those two commands, start update manager and you should be able to upgrade pidgin.

Thursday, June 18, 2009

attachment_fu: No thumbnail generation (make sure you have a parent_id!)

I've been using attachment_fu to upload images in a rails application. Everything was working well except the thumbnails weren't generating. I think everyone follow's Mike Clark's excellent blog:
http://clarkware.com/cgi/blosxom/2007/02/24

He doesn't mention that the parent_id is required, so I thought it was simply the id of the model to which the photo/upload was attached.

After some digging, I found a conversation here:
http://www.ruby-forum.com/topic/104213

Turns out attachment_fu, creates a new entry for each thumbnail, then strings them together with the parent_id. So, make sure you keep the parent_id column otherwise attachment_fu silently fails to generate the thumbnails.

Wednesday, June 17, 2009

LDAP/Active Directory Authentication with Apache

Alright, it took a few hours to get all the parameters correct, but we finally achieve centralized authentication by linking apache authentication to our ActiveDirectory. The critical concept to keep in mind when doing it is that there are two things you need to specify. First, you need to specify the user that apache will connect as, known as the "BindDN". Second, you need to specify the query string that allows apache to locate a user in the directory. This is the LDAP url.

In the end, this is the element we needed to add to our apache config. On ubuntu, we dropped this into the site-specific configuration file in /etc/apache2/sites-available.


<Location />
AuthBasicProvider ldap
AuthType Basic
AuthzLDAPAuthoritative off
AuthName "Portal"
AuthLDAPURL "ldap://actrivedirectorymachine/DC=foo,DC=com?sAMAccountName?sub?"
AuthLDAPBindDN "CN=apache,CN=Users,DC=foo,DC=com"
AuthLDAPBindPassword "PASSWORD"
require valid-user
</Location>


In the above example, I created a user specifically for apache, with password PASSWORD. I highly recommend using JXplorer to verify your bind credentials.

When configuring JXplorer, if you are using ActiveDirectory, most likely you'll need a Base DN as well. This was "DC=foo,DC=com". The user name is the exact string from above, same with password.

The AuthLDAPURL is a query that will be used to grab the entry associated with the username that the user types in when prompted by the browser. In the example above, it will search within DC=foo,DC=com against the attribute "sAMAccountName".

I hope this helps people out.

Sunday, June 14, 2009

git installation on Mac OSX (to fix undefined method acts_as_mappable)

I recently started playing around with geokit within rails. I kept getting an "undefined method `acts_as_mappable'" error. It turns out that the script/plugin command for the installation of the geokit-rails plugin was silently failing. When I entered the script/plugin install command, the command would run and not output anything to the screen. I ended up with an empty vendor/plugins directory.

The root cause ended up being:

I didn't have git installed and on my path. So, I installed it first using this:
http://code.google.com/p/git-osx-installer/

And added /usr/local/git/bin to my path.

(Then later I installed git via ports.)

When I returned to the rails app and re-entered:
script/plugin install git://github.com/andre/geokit-rails.git

All worked properly.

Friday, June 5, 2009

WMV to FLV Conversion and Thumbnail Generation

Here are two handy command lines for converting WMV to FLV (which is a better format for hosted video), and generating thumbnails for those videos. Both use ffmpeg.

To convert from wmv to flv:
ffmpeg -i test_video_clip.wmv -ab 48 -ar 22050 -s 512x384 -g 50 -qblur 1 -pass 1 -b 800 -r 25 -y encodedvideo.flv

To generate the thumbnail:
ffmpeg -i test_video_clip.flv -an -ss 00:00:03 -an -r 1 -vframes 1 -y %d.jpg

Tuesday, May 12, 2009

Eclipse on Mac OS X

I don't know why Java support is so bad on Mac OS X. If you want to run Eclipse on Mac you may need to finagle it a bit. When I first tried to run Eclipse, I go the following error:
_NSJVMLoadLibrary: NSAddLibrary failed for /System/Library/Frameworks/JavaVM.framework/Versions/CurrentJDK/Libraries/libjvm.dylib

To fix this, I first followed this blog post:
http://blog.kischuk.com/2008/05/08/running-eclipse-on-macbooks-with-java-6/

But, unfortunately I didn't have 1.5.0 installed. After a lot of googling, I found that Update 4 (available to registered developers through ADC) includes JDK 1.5.0

So, if you are having trouble getting Eclipse running, install Java for Mac OS X Update 4.

Wednesday, April 29, 2009

Invalid Arguments 501 5.5.4 Error when sending SMTP email through exchange

OK, I recently needed to do a quick and dirty mail send from java using an exchange server. I kept getting a 501 5.5.4 error back, which is Invalid Arguments to the RSET method. It turns out, when looking at the example:
http://snippets.dzone.com/posts/show/3328

I had neglected to set the submitter properly. When googling, all of the discussions were around trailing dots or spaces. After wiresharking it, I finally saw the problem.

Word to the wise, make sure when you are using AUTH with SMTP to set the submitter properly.

Tuesday, April 21, 2009

Vanilla Discussion Forum : Initial Login Problems

In the past I've been a big phpBB fan, but recently I've wanted a leaner meaner discussion forum for integration into an existing site/application. I stumbled upon Vanilla.

http://getvanilla.com/

So far, I'm a big fan. I did have some trouble getting setup though. When installing, you set the COOKIE_DOMAIN, which php then uses to store your cookie. This needs to match the url through which you are accessing the site. If it does not, the authentication will work but you will not be signed in. In other words, if you type in the wrong password you will be rejected. If you type in the correct password you will be forwarded to the main page, but you will still see the option to sign in up in the top right hand corner.

Specifically this will happen if you have it deployed locally and access it using "localhost". In this case, simply map your deployment domain to your local IP address in your /etc/hosts file.

Also make sure your conf/settings.php contains the correct domain. I searched for localhost in that file and made sure all were changed to be the deployment domain.

I hope this helps anyone trying to get started with Vanilla.

Wednesday, April 15, 2009

Building a REST Service in PHP

OK, I'm a big fan of using the right tool for the job. Daily I switch between Rails, Java and PHP. Rails is probably the easiest to use for developing a REST Service, but sometimes you don't want the overhead of a rails server and you want that feel good all over sense of stability that LAMP gives you.

First, I created an alias to send all requests to a single php file (service.php).

AliasMatch /Beers/.* /var/www/www.liquidmirth.com/service.php

This sends all http://www.liquidmirth.com/Beers/* requests to service.php. Then, I created a generic service class. This grabs the method, id and the target object from the url, and uses reflection to invoke the appropriate method on a service class. The code for the service.php class is below.


include 'beer.php';
header('Content-Type: text/xml');
$method = strtolower($_SERVER["REQUEST_METHOD"]);
$object = $_SERVER["REQUEST_URI"];
$object = substr($object, 1, strlen($object)-1);
$vars = explode("/", $object);
$object = $vars[0];
$id = $vars[1];
// echo "method ==> (".$method.")
";
// echo "object ==> (".$object.")
";
// echo "id ==> (".$id.")";
$method = new ReflectionMethod($object, $method);
echo $method->invoke(NULL, $id);


In the above code, the request is received, the Beer object is created and the get() method is invoked on that object with the id supplied as a parameter. So, when http://www.liquidmirth.com/Beer/1234 is requested. Beer->get(1234) is invoked.

Its that simple.

Wednesday, April 8, 2009

Fixing Sendmail (dead.letter issue)

OK, we have lots of machines that we host out on slicehost. We want them to be part of our domain (foo.ourdomain.com). So, we configure their hostname to be that. Unfortunately, we want them to be able to send mail to people in our domain (john@ourdomain.com). Since the machine thinks it is part of the domain, it looks for that user, can't find them and consequently fails to send the mail out.

Testing sendmail, it always drops the mail into dead.letter immediately.

To fix this, I simply edited /etc/mail/sendmail.cf and uncommented the domain, setting it to some fictitious name (since the machine won't be the destination for any mail.

I uncommented this line:
Dj$w.Foo.COM

Monday, January 5, 2009

Linux System Information

Presently we are building an inhouse compute cloud, which means admining a bunch of machines. To get quick information on the compute power of a machine look in:

/proc/meminfo
/proc/cpuinfo
/etc/issue (since these are ubuntu machines)
and a df-k for disk space.

-brian