Archive for December, 2008

Archetype_tool, QueueCatalog, becareful with indexing with Plone’s portal_catalog !

Posted in plone on December 14, 2008 by toutpt

The portal_catalog in Plone is the index of all objects. It means the portal_catalog store some datas (indexes+metadatas) per object in the website. So when the number of entries in the site grows up, the portal_catalog size follows. The post by tarek Indexing explain it well:

50% of the size of the ZODB was the catalog (I am talking about gigas here)

And that is the case for some website i m working on

If you are storing about 10 indexes and 10 metadatas in it. the size of a brain with indexes for one object is not so far from the real object (brains are persistent object stored !). So if you are about to add an index, or a metadata think about it twice before doing so.

But as me you want to be able to index a new data from my new content type to search for objects that match to your criteria . For example you have a type “Contact” with a field “EMail”. Many developpers will just adding an index getEmail in the portal catalog. Please don’t do it! Just adding a new catalog tool inherited from CMFPlone CatalogTool, and changing the init index method (Plone2.5 doesn’t support well the generic setup way of managing catalog index and metadata). Next add path index that is needed by the Archetype CatalogMultiplexin class to reindexObject, and register this object in the archetype tool so it will be automaticaly synchronised with the new catalog.

You can query that new catalog like the portal_catalog, and  if you don’t want to wake up the object and get the metadata from the portal_catalog, just do a query to the portal_catalog by using the path index. Recently i have totaly remove a content type from the portal_catalog. That has been possible because this content type was not displayed throw classic plone folder_content view. As results 30 000 brains has been removed from the portal_catalog.

Think about why you will add an index in portal_catalog. Every object in your web site has that field or are supposed to be able to have it ? if you answer yes, you can. A good example is a rating information, If you want to be able to sort results on it, you must use the portal_catalog.

An other things very important about indexing with Plone is about full text indexation and is for performance purpose. As you will find in mailing list, lot’s of people are complaining about ZCTextIndex. They are slow ! I have done a bench this week with PloneQueueCatalog and factory hack. I have delayed every ZCTextIndex, and put a pyston timed decorator on indexObject, reindexObject and unindexObject of CatalogMultiplexin class. The results was:

  • 0.039 seconds with ZCTextIndexes and 0.025seconds with ZCTextIndexes delayed on the last indexObject call.
  • 0.5sec for not interesting index+reindex+unindex in portal_factory without factoryhack , and 0.11sec with the hack. This one has not been improved by QueueCatalog.

The portal_catalog has 130 000 objects indexed, and the data.fs is about 1.3GO. This bench has been done on my laptop (IntelCore2 duo + 2GO RAM).

So now i m using those products in production now.

TESTS PURPOSE: I would like to make a relational database to replace the portal_catalog. Now postgresql support for full text indexing, and it will be pretty easy to return basic objects that will use results from sql query as attribute. With a good JMeter test plan to see results. If anyone has already done that kind of test i m really intersting in this, please tell me.

A word about lorem ipsum (aka lipsum) generator and Python

Posted in lipsum, plone, python, selenium on December 10, 2008 by toutpt

Today i have made some fonctionnal tests with lorem ipsum text generator for Plone Selenium. The first thing i have done is to search for a python script to make text generation a la lipsum.

I have been surprised by the fact most of scripts i have found use text alerady generated. It s the case for :

But on the official web site we can find a python script : pypsum. I have used and derived that one to generate CSV file by that way:

python data_csv.py -a 10 -f paras:3:texte_paras:1:description_words:5:title -d articles10_20081210_02.csv

And then:

python selenium_html.py -c articles10_20081210_02.csv -d articles10_20081210_02.html -o addArticles
You can find all my tests here

For sure it’s Plone specific code for the selenum part, but i m pretty happy of the result. I can now create random content for any website very quickly.