OT: Django 1.2 on track

A little off topic but I'm very pleased to see that the timetable for Django 1.2 seems to be holding since they just entered feature freeze with beta 1.

Among my favorite of features to come is, of course, support for multiple database backends. Other goodies are smart IF-tags in templates and template caching.


Sun Oracle PostgreSQL

I really don't know if this is funny or sad:


EnterpriseDB + Red Hat = <3

This press release from Reuters about Red Hat investing in EnterpriseDB is great news.

Since Red Hat once manage to bring Linux to the room of Enterprise IT I'm hoping this would increase the acceptance of PostgreSQL adoption in the Enterprise world. We all know how slowly the Enterprise wheels are turning... :)


Interesting choice of words

Just read the article Is it Postgres' time to shine? and one sentence caught my attention:

Postgres is an enterprise Java database, more suitable for carrying corporate data than the Web's consumer data.

How do you interpret the first part? That PostgreSQL is a suitable database backend for enterprise Java or that PostgreSQL is written in Java?

My thought was "If I didn't know about PostgreSQL I would think that it was a database system written in Java". What's your take?

(As a side note I don't think that article was particularly interesting but I suspect CNET News to have a fair amount of management among their readers.)


strip html tags but keep href attribute value

To be able to match hostnames in links and at the same time get the benefits of tsearch I created a small function to strip of html tags while keeping the link intact for tsearch to tokenize.

The second regexp_replace is really not necessary since tsearch will ignore any HTML tags, or rather see them as XML tokens.

I'm sure there are more clever ways of accomplish the same thing but this seemed as a fine compromise for the moment. Thoughts and comments are of course welcome. :)

begin;
    -- strip tags function
    -- we use this to strip all html tags but still preserving the href
    -- attribute value so tsearch later can match host.
    -- Does two runs:
    -- 1) strip all tags containg the attribute href but preserve the 
    --    attribute value and put it in parentheses.
    -- 2) strip of any remaining tags
    CREATE OR       REPLACE FUNCTION strip_tags(TEXT) RETURNS TEXT AS $$
        SELECT regexp_replace(
            regexp_replace($1,
               E'<[^>]*?(\s* href \s* = \s* ([\'"]) ([^>]*?) ([\'"]) ) [^>]*?>',
               E' (\\3) ',
                'gx'),
            E'(< [^>]*? >)',
            E'',
             'gx')
    $$ LANGUAGE SQL;

pgbouncer: stats entries in syslog

Got a little confused interpreting the stats entries from pgbouncer in my syslog. For a second didn't realize it was the stat period average shown (as described in the usage docs). Silly me. :)


Quick understanding of HBase

In an article at High Scalability this article explaining HBase on a conceptual level was referenced. It's a very good starting point for understanding the basic concept of HBase (and BigTable) and it's no more than a five minute read.


Finally a real world webservice that enterprises can use to save money

Meet the Meeting Ticker.


"No to SQL"

Every time I see something or hear something like this I sigh a little bit. Not only when it's related to SQL but in the world of computer professionals in general. "The right tool for the job" seems to be a hard concept to understand sometimes. I wonder why?

When you choose a tool there will be pros and cons. Every time. Understand that and you will understand that picking a tool for a job i not about "saying no" or yes to something. If you want to say no or yes go join some political movement/party/whatever.

Would I choose a RDBMS for all of my database needs? Of course not! It all depends on the project. Is the data sensitive? What about integrity? Atomic operations? Type of data? Accessability? Number of users? The number of questions you have to decide on are many and starting by saying "No to SQL" is a pretty darn stupid. In my book that only proves that you do not understand what you're talking about. Trying to argue that X and Y is storing Z amount of data without using SQL only supports the lack of understanding. (Asking "why" X and Y is doing it that way on the other hand is a very good thing.)

Update: The ongoing discussion at ycombinator just shows how silly these kind of "debates" are.

Update 2: High Scalability have also noticed It Must be Crap on Relational Dabases Week.


PostgreSQL 8.4 released

Spread the word, PostgreSQL 8.4 is out!


generate_series to cover for missing data in reports

Problem: Generating a nice report grouped by date while some dates are missing data. (I admit the topic indicates something a little bit more exciting...)

Previously I have solved this kind of problem in the client code or by having some sort of calendar table to join with. This time I don't have any client code (yet) and I have no other tables to use. Lucky me I'm using PostgreSQL since it has the generate_series function.

So basically all I have to do is a RIGHT OUTER JOIN with generate_series as dates:

select date '2009-02-01' + generate_series(0,6);

An full example to play with:

drop table if exists mytable;
create table mytable(id serial, day date,     metric int);
insert into mytable(day, metric) values('2009-02-01', random()*10),
('2009-02-01', random()*10), ('2009-02-01', random()*10),
('2009-02-03', random()*10), ('2009-02-03', random()*10),
('2009-02-03', random()*10), ('2009-02-03', random()*10),
('2009-02-04', random()*10), ('2009-02-04', random()*10),
('2009-02-05', random()*10), ('2009-02-05', random()*10),
('2009-02-05', random()*10), ('2009-02-05', random()*10),
('2009-02-07', random()*10), ('2009-02-07', random()*10);

Now try it out:

select
    coalesce(mytable.day, foo.day) as day,
    coalesce(mytable.sum, foo.sum) as sum
from
    (select day, sum(metric)
     from mytable group by day) as mytable
right outer join
    (select date '2009-02-01' + foo.day      as day, 0 as sum
     from generate_series(0,6) as foo(day)) as foo
on mytable.day=foo.day
order by day;

As stated earlier I have usually solved this problem in client code. I recall doing so both when using MSSQL and Oracle although I haven't worked with them for a long while now. I can't recall ever having done any similar reports while using MySQL.

I'm curious about the current state of Oracle, MSSQL and MySQL since I'm not actively using them. Is there something similar as generate_series? What about DB2?


Rebuild template1

A colleague of mine got his template1 corrupted and I just told him to recreate it from template0. But there's a catch of course, template1 being a template database.

The quick fix for this is to connect as superuser, set template1 as not being a template database, drop and recreate the template1 from template0:

$ psql -U postgres postgres
postgres=# update pg_database set datistemplate = false where datname='template1';
UPDATE 1
postgres=# drop database template1;
DROP DATABASE
postgres=# create database template1 template=template0;
CREATE DATABASE
postgres=# update pg_database set datistemplate = true where datname='template1';
UPDATE 1
postgres=#

I also found an old, but not outdated, article by Josh Berkus about pretty much the same issue, explaining it more thorough, on the PostgreSQL Wiki.


A sane default

Without any changes to template1 a sane default when creating a new database could be something like the script below.

It creates a newdatabase newdb with two users, adminuser for administration and rouser for read only access. (I just realized I forgot a rwuser but never mind.)

The only thing that bugs me is the tedious work of granting access to rouser when adding stuff to the database. Any takers on that one?

begin;
    create role dbname nosuperuser nocreatedb nocreaterole noinherit nologin;
    create role adminuser nosuperuser nocreatedb nocreaterole noinherit login encrypted password 'secretpassword';
    grant dbname to adminuser;
end;

create database dbname with owner=adminuser;

\connect dbname
begin;
    revoke all on database dbname from public;
    create role rouser nosuperuser nocreatedb nocreaterole noinherit login encrypted password 'secret';
    grant all on schema public to adminuser with grant option;
    revoke all on schema public from public;
    grant connect on database dbname to rouser;
    grant usage on schema public to rouser;
end;

\connect dbname adminuser
begin;
    create table account(id serial, created timestamp  default now(), name text not null, password text not null);
    grant select on account to rouser;
end;

-- drop database dbname;
-- drop role rouser;
-- drop role adminuser;
-- drop role dbname;

Fibonacci sequence using WITH RECURSIVE

-- Fibonacci sequence
-- Example to demo WITH RECURSIVE in PostgreSQL 8.4
with recursive f as (
    select 0 as a, 1 as b
    union all
    select b as a, a+b from f where a < 100
) select a from f;

Results in:

  a 
-----
   0
   1
   1
   2
   3
   5
   8
  13
  21
  34
  55
  89
 144
(13 rows)

Common Table Expressions (WITH and WITH RECURSIVE)

Update: Updated note about DB2 version and added a note about Oracle.

Update: Firebird has also implemented CTE.

Going through the presentations from FOSDEM'09 I opened up David Fetter's Trees and More presentation and thought "what could be new about trees?". Man was I wrong!

The presentation shows classical solutions to the problem of trees and hierarchical data in SQL and then moves on to show how this can be solved by the implementation of WITH RECURSIVE in the upcoming PostgreSQL 8.4.

Taking a further look at WITH and WITH RECURSIVE I realised that this is a very powerful tool. Not entirely new, as Depez points out, but still a very usable thing. RECURSIVE is the key part that is new and that could not be solved in a very good way previously.

I have just started to think about what this will mean for our database and so far I've found a case where this is gonna make some difference. Of course I have to make some experiments to verify my theory. :-D

As far as I can tell PostgreSQL is the third RDBMS implementing the WITH and WITH RECURSIVE statement, known as Common Table Expressions or CTE, from the 2003 (WITH) and 2008 (WITH RECURSIVE) SQL-standard. Previously CTE has been implemented by Microsoft, introduced in MS SQL Server 2005, IBM, with DB2 V7, and by Firebird SQL Server, since v2.1.

I thought I could slip under the radar about Oracle but I repent with this note. ;-) Oracle have had WITH since v9.0 and has had CONNECT BY that serves the purpose of WITH RECURSIVE for ages.

I'm not gonna write any examples here since there are a lot of them out there.

Resources

(Although I might come back with a little more hands on example after playing around with it. For now I recommend Depez post as it contains a good discussion and example of the feature and what it'll mean for PostgreSQL users.)


RSS 2.0