Getting all text from pages in a Plone server

published Jun 01, 2015 03:30 by admin ( last modified Jun 01, 2015 03:32 )

Start server with the debug sub command. "app" is root of the Zope server. In my case my plone site is called "site". So I assigned app.site to the variable "site". The pages are in Archetypes, with a UID method for each.

>>> res = site.portal_catalog(portal_type='Document')
>>> res.actual_result_count
2513
>>> res[0]
<Products.ZCatalog.Catalog.mybrains object at 0xb31020c>
>>> res[0]['Title']
'Some databases'
>>> res[0].getObject().SearchableText()
'Link---Better-MongoDB-Performance---Tokutek  Some databases   \r\n\tNotes to self. \r\n \r\n\tTokutek \r\n \r\n\t \r\n\tTokutek is MongoDB but allegedly with better performance for indexing and some other stuff \r\n \r\n\tThe direct benefits include high-performance indexing, strong compression, and performance stability \xe2\x80\x93 in other words, the performance stays high, even when data is larger than RAM \r\n \r\n\t\xc2\xa0 \r\n \r\n\tRead more:  Link - Better MongoDB Performance | Tokutek  \r\n \r\n\tHyperleveldb - a faster version of leveldb \r\n \r\n\t Inside HyperLevelDB :: Hacking, Distributed  \r\n \r\n\tArdb \r\n \r\n\tArdb, uses Redis protocol for accessing some fast databases, mostly leveldb. \r\n \r\n\t Ardb is a BSD licensed, redis-protocol compatible persistent storage server, it support different storage engines. Currently LevelDB/KyotoCabinet/LMDB are supported, but only LevelDB engine is well tested.  \r\n\t\xc2\xa0 \r\n \r\n\tKDr2/redis \r\n \r\n\tAnother one that does the same for leveldb only: \r\n \r\n\t KDr2/redis-leveldb \xc2\xb7 GitHub  \r\n \r\n\t\xc2\xa0 \r\n '
>>> res[0].getObject().UID
<bound method ATDocument.UID of <ATDocument at /site/index_html/Link---Better-MongoDB-Performance---Tokutek>>
>>> res[0].getObject().UID()
'0b933c2f07cb4e81a36b410429fe4e50'
>>> docs = [(doc.getObject().UID(), doc.getObject().SearchableText()) for doc in res]