Archetype_tool, QueueCatalog, becareful with indexing with Plone’s portal_catalog !

Posted in plone on December 14, 2008 by toutpt

The portal_catalog in Plone is the index of all objects. It means the portal_catalog store some datas (indexes+metadatas) per object in the website. So when the number of entries in the site grows up, the portal_catalog size follows. The post by tarek Indexing explain it well:

50% of the size of the ZODB was the catalog (I am talking about gigas here)

And that is the case for some website i m working on

If you are storing about 10 indexes and 10 metadatas in it. the size of a brain with indexes for one object is not so far from the real object (brains are persistent object stored !). So if you are about to add an index, or a metadata think about it twice before doing so.

But as me you want to be able to index a new data from my new content type to search for objects that match to your criteria . For example you have a type “Contact” with a field “EMail”. Many developpers will just adding an index getEmail in the portal catalog. Please don’t do it! Just adding a new catalog tool inherited from CMFPlone CatalogTool, and changing the init index method (Plone2.5 doesn’t support well the generic setup way of managing catalog index and metadata). Next add path index that is needed by the Archetype CatalogMultiplexin class to reindexObject, and register this object in the archetype tool so it will be automaticaly synchronised with the new catalog.

You can query that new catalog like the portal_catalog, and  if you don’t want to wake up the object and get the metadata from the portal_catalog, just do a query to the portal_catalog by using the path index. Recently i have totaly remove a content type from the portal_catalog. That has been possible because this content type was not displayed throw classic plone folder_content view. As results 30 000 brains has been removed from the portal_catalog.

Think about why you will add an index in portal_catalog. Every object in your web site has that field or are supposed to be able to have it ? if you answer yes, you can. A good example is a rating information, If you want to be able to sort results on it, you must use the portal_catalog.

An other things very important about indexing with Plone is about full text indexation and is for performance purpose. As you will find in mailing list, lot’s of people are complaining about ZCTextIndex. They are slow ! I have done a bench this week with PloneQueueCatalog and factory hack. I have delayed every ZCTextIndex, and put a pyston timed decorator on indexObject, reindexObject and unindexObject of CatalogMultiplexin class. The results was:

  • 0.039 seconds with ZCTextIndexes and 0.025seconds with ZCTextIndexes delayed on the last indexObject call.
  • 0.5sec for not interesting index+reindex+unindex in portal_factory without factoryhack , and 0.11sec with the hack. This one has not been improved by QueueCatalog.

The portal_catalog has 130 000 objects indexed, and the data.fs is about 1.3GO. This bench has been done on my laptop (IntelCore2 duo + 2GO RAM).

So now i m using those products in production now.

TESTS PURPOSE: I would like to make a relational database to replace the portal_catalog. Now postgresql support for full text indexing, and it will be pretty easy to return basic objects that will use results from sql query as attribute. With a good JMeter test plan to see results. If anyone has already done that kind of test i m really intersting in this, please tell me.

A word about lorem ipsum (aka lipsum) generator and Python

Posted in lipsum, plone, python, selenium on December 10, 2008 by toutpt

Today i have made some fonctionnal tests with lorem ipsum text generator for Plone Selenium. The first thing i have done is to search for a python script to make text generation a la lipsum.

I have been surprised by the fact most of scripts i have found use text alerady generated. It s the case for :

But on the official web site we can find a python script : pypsum. I have used and derived that one to generate CSV file by that way:

python data_csv.py -a 10 -f paras:3:texte_paras:1:description_words:5:title -d articles10_20081210_02.csv

And then:

python selenium_html.py -c articles10_20081210_02.csv -d articles10_20081210_02.html -o addArticles
You can find all my tests here

For sure it’s Plone specific code for the selenum part, but i m pretty happy of the result. I can now create random content for any website very quickly.

JMeter, improving performance of a Plone web site

Posted in plone with tags , on October 13, 2008 by toutpt

Last week i have made a rush to improve performance of a Plone based web site. For performance testing i have used JMeter, because i have seen Using open source tools for performance testing

JMeter is really nice to use. Just launch it’s proxy, plug your browser on it, and do your test. Next you save it as xml and you can edit the test. So you can login (it support cookies) you can create content (with an once logic controller) consult content, and stress your server.

What i have learn from this about Plone is:

  • Do not use brains or any object in templates, or you will not beeing able to cache your logic code in ramcache. Use dict that contains every strings ready to be displayed in the templates.
  • How to use the ram cache
  • i can store acl_users in ramcache, and i have been surprised to see the difference. On 5 tabs hitted, i have hit the cache 278 times …
  • Archetypes is damly slow (about one second to set some attributes of an object in a btree and reindexIt)
  • CMFPlone.utils.createObjectByType do a reindexObject
  • Do not add any index to the portal_catalog, use the buinding done by archetype_tool to be able to use other index. I m adding about one catalog tool per custom content type.
  • A query on the portal_catalog can take one second if you have for example a list of 100 paths (query['path'] = ['/first/path', '/second/path'] and more than 100 000 entries.

I have learn many other things during the last week, but now i m using stress tests during the dev

Look at my profile

Posted in plone on April 13, 2008 by toutpt

First of all I would like to thank guys who wrote GenericSetup. I m writing this post cause i think profiles are not well used. They are used like Extensions/Install.py scripts. So what is a profile, and how to use it ?

Profile is about configuration of portals (portal_xxx inside the zmi). In that way I have started by asking myself “what means install/uninstall in Plone” ? Some answers:

  • “its about the Extensions/install.py scripts.”
  • “Like on your computer, you install a software or a lib”

Well, there is no sens for the word “install” in the world of plone. We are speaking about configuration of tools. So the CMFPlone/profiles/defaults, contains a plone default configuration. Right ?

In that way extending default plone configuration means, change it a bit, adding some directory views in portal_skins for example.

What happens if you are “installing” 20 products to you configuration ? Here it is ! You have lost the configuration of your Plone project and will not be able to understand why this new product you are trying to install break your website.

Seeing this, I have proposed to add one single profile by project. In that way you controle your entire Plone, and if you have a problem, you just have to apply this profile. But that also means you have to write it. Here is how i proceed:

  • Install every product you need
  • Export all steps
  • Adding a new product/egg specially for your project
  • Put the results of the exports in it
  • Read it,
  • Add all constraints in it. For example the order of layers inside skins.xml

This is done, you get it ! There is no duplication of files, but integration of products inside Plone configuration (my work). Writing XML is boring ? Writing them faster by using my eclipse templates ;)

The next point is about setup handlers. I hate product that add setup handler just to say “hey i know how to add a step”. I always ask for “what is a step for you ?” So steps are not the way to call a python script. If you are tented to add a step, just use Extension/install.py to put your script. Adding a step make sens only if you are adding a tools, and you want that tools to be configurable. So you don’t add setuphandler.py but you write import/export.py for your tool, and then you adding the step with import_step.xml.
An other problem is the configuration of your Plone project can be different for a production server and developpement local server. For example the mailhost.xml file can be different. In that way you can extend your profile with just a smaller profile that reconfigure what you need.

This is why i m laughing when i heard “uninstall profile”.

Finally, i don’t understand why the portal_quickinstaller is now “aware” of extensions profile. That doesn t help to understand profiles. People will continue to write install/uninstall profiles. If any one know why i m ready to discuss about it.

My point of BrowserView

Posted in plone on April 13, 2008 by toutpt

Since Plone2.5 has been released, there is a good way of seperate logic and presentation from templates, but the use of BrowserView is not used in the same way by developpers. i m trying here to explain my point of view about that component.

A very good presentation of BrowserView is already done by Optilude;

According to my point of view and the MVC pattern, a BrowserView is just a controller. its role is to prepare data to be displayed, or to trigger a process. Most of the time i m querying the portal_catalog, redirect the user, add status messages, ….

I like the way portlets are done under plone2.5. for me it s the best example of how to use BrowserView.

The other use case of BrowserView is to render the attached template and insert a “view” instance in it. This is a kind of “implicit” behaviour that i hate in zope2. So you can call it directly by the url. I don’t understand that choice, but Plone3 use it in that way. And that do not let you reuse the logic code inside the BrowserView in an other template. Controller is known to be reusable throw the entire software.So please, use the BrowserView component like a controller.

ReferenceEngine and poor Archetypes

Posted in plone on March 7, 2008 by toutpt

Today, got a bug on custom content type. “I can’t copy your content type, all the references are lost”

Well, i assume i don’t know how copy / paste works in plone. I first reproduce the bug without a pb:

  • Create two documents : a and b. set related document from b to a. means b has related document a. the field relatesTo is a ReferenceField.
  • Make a copy of b, said c.
  • Oh !!! c has no related document.

The bug is reproduce: ReferenceField doesn t support copy . Now i know that, i m starting to read all i found about References in Archetypes, and find this in Referenceable.py:

####
## In the case of:
## - a copy:
##   * we want to lose refs on the new object
##   * we want to keep refs on the orig object
## - a cut/paste
##   * we want to keep refs
## - a delete:
##   * to lose refs
####

So this is not a bug but a feature ? Don’t be disapointed, Plone is build to be customized. CopySupport comes from zope, ok interesting code out there. I m finally try to MonkeyPath the methode manage_afterAdd and succedd in keep the reference:

## OFS Hooks
def manage_afterAdd(self, item, container):
    """
    Get a UID
    (Called when the object is created or moved.)
    """
    print "Referenceable manage after add"
    isCopy = getattr(item, '_v_is_cp', None)
    if isCopy:
        setattr(self, config.UUID_ATTR, None)
        self._delReferenceAnnotations()

Every things ok ? Well no, in fact, i m trying to validate by retry the scenario above. c document has the related document to a. good, but i next delete c and that action delete the b relation with a (b doesn t have relation with a anymore). And Archetypes team writes ## * we want to lose refs on the new object ??? You call this a feature, i call that an error of referenceengine design !

Thank you Archeytpes !

If anyone know how to fix this i m ready to listen :)

MDA for alfresco, Meta-Model for ECM

Posted in acceleo, eclipse, plone, tools on February 18, 2008 by toutpt

Today i attended to a conference about Alfresco and MDA by BlueXML. What i have discovered is a set of softwares done with Eclipse and Acceleo to generate an Alfresco project (configure the portal, make content types, …). I have been really interested in the fact they have done a new meta model derivated from UML to model an ECM project and also a GUI modeler associated with this meta-model.

That means the meta-model can also be used for Plone3. For example configuring Plone, by creating group of user, permissions, workflows, … would be stuff available if we use it. But firstly i need to test it to validate the usability of this meta model and of the modeler. All is available at BlueXML home site.

Plone3 & Acceleo, the first step

Posted in acceleo, plone on February 10, 2008 by toutpt

I have work some hours on a simple plone2.5 code generator with Acceleo. It is available on the acceleo svn:

svn checkout svn://svn.forge.objectweb.org/svnroot/acceleo/trunk/modules/community/uml21/zope/plone/25/org.acceleo.module.pim.uml21.plone25/trunk

This code generator is not finished, but the way of doing is good enought to start the plone3 one. I would like here to explain the purpose of the Acceleo Generator for Plone3 i want to make.I will first explain the problems i have with ArchgenXML.

What i don’t like in AGX:

  • The license in each source file (i prefer just a license.txt file)
  • The billion tagged values (i have lost hours here)
  • The generated code itself doesn’t look like with the code i would have produced.
  • The command line
  • ArgoUML
  • All the hacks done every where to make the code compatible with two versions of Plone
  • You can’t modify a line of code generated without lost it if you re-generate your code

What i like in AGX:

  • The way you use UML (copy the model, and then do a simple class diagram, it s up)
  • The i18nized schema generated with po files
  • The generated tests
  • It works on all well known OS (linux, macos, windows)
  • Lots of documentation
  • The user code slots well thought.

For sure i want to keep all that good point for the project. So the overview of what i want:

  • Easy to install and to use
  • Running on most OS known
  • Code template easy to customize (making multiple branches of my own templates)
  • Do not generate 100% of the code by working hours in your UML diagrams
  • Be able to get an existing UML and generate only what you want

An other point: generate something only if it save your time. The best example i have is tagged values from AGX, like Searchable = 1. One tagged value for one line of code !A first advice from Cédric Brun (obeo) is to don’t fall in the modelisation of the code itself. For example doing an UML component to generate a zope component (BrowserView, adapter, …). In that case you will lost a lot of time in doing you UML diagram, and be obliged to add stereotypes (adapter, …). So to follow this advice, i have think about the idea of using Component diagram from UML, and i finally don’t want to use it, cause for me an UML component is not equal to a zope component. A UML component can be more seen as an egg. I need to think a bit more about that point, but that could be a great aspect to zope code generation.

Would we need to ‘model’ workflow and generate them according to a state diagram ? Here the point is a bit more complex. In fact you know that you need to make them to explain to your customers the need of specify workflows by UML. But the permission system in zope is specific to it, and the state diagram is not suppose to support this (in AGX we use tagged value one more time). And since we use GenericSetup to specify workflow now, the time saved by doing the state diagram for your workflow is negative. So i think we will just generate the state, but not the permissions associated, that are often explain with the diagram in a documentation. But i would like to generate the test associated with workflows. There were a good conference at Naple on that point.

Next, do we force the use of stereotypes to generate stuff or do we do as with AGX, and so force the use of ’stub’ stereotype to indicate the generator that this class is not a content type to generate. I personally prefer the first option. In that way you can take an existing UML diagram, load the plone3 profile, and said this package is an egg, this class is a ATContentType.

Well, a good demo package to do is the case of the Martin Aspeli ’s book.

Next time i will publish the UML from what i want the martin’s code to be generated.

A first shot on eclipse xml tempaltes

Posted in eclipse, plone on January 26, 2008 by toutpt

Most of text-types under eclipse can be templated throw the Window->Preference menu. Those templates can next be imported/exported. I have make a first shot for my work on plone, because next week i have a lot of profiles to write for some old style products. My first shot is available here under GNU GPL license.

Those xml templates are for:

  • cssregistry.xml
  • factorytool.xml
  • jsregistry.xml
  • skins
  • types/MyTypes.xml
  • types.xml
  • workflows.xml

The rest will come later:

  • workflows/MyWF/definitions.xml
  • toolset
  • rolemap
  • properties
  • propertiestool
  • memberdata_properties
  • mailhost
  • import_step
  • export_step
  • control_panel
  • contenttyperegistry
  • catalog
  • action_icons
  • actions

As i have said this one is a first shot, done in one hour, just to test, so now it needs to be improved.

Mylyn make eclipse out of memory

Posted in eclipse, linux, tip on January 26, 2008 by toutpt

This week i have discover that mylyn project is include in most of eclipse bundle. But this plugin make eclipse starve for memory… By the menu window->preferenc-> Show heap status, i have discover the acceleo bundle starting with a need of 180 MO with nothing opened. With a common eclipse (Pydev, WTP), my eclipse has an heap status of 35MO. With my project of doing a Plone2.5 generator with Acceleo, i want to have a UML editor (papyrus or topcased ?) and acceleo running perfectly.

You can find some pages about this memory problem with mylyn: http://www.ibm.com/developerworks/java/library/j-mylyn2/

So start your eclipse, take a cofee, and then go to help-> software and update -> manage configuration. Desactive all mylyn integration plugin to be able to desactive mylyn plugin itself. Then restart your eclipse, it s ok.

With the acceleo bundle i have desactivate subversive, and mylyn, i m now around 40MO on the heap status. Now i m ready to make UML diagrams.