Getting all text from pages in a Plone server
Start server with the debug sub command. "app" is root of the Zope server. In my case my plone site is called "site". So I assigned app.site to the variable "site". The pages are in Archetypes, with a UID method for each.
>>> res = site.portal_catalog(portal_type='Document') >>> res.actual_result_count 2513 >>> res[0] <Products.ZCatalog.Catalog.mybrains object at 0xb31020c> >>> res[0]['Title'] 'Some databases' >>> res[0].getObject().SearchableText() 'Link---Better-MongoDB-Performance---Tokutek Some databases \r\n\tNotes to self. \r\n \r\n\tTokutek \r\n \r\n\t \r\n\tTokutek is MongoDB but allegedly with better performance for indexing and some other stuff \r\n \r\n\tThe direct benefits include high-performance indexing, strong compression, and performance stability \xe2\x80\x93 in other words, the performance stays high, even when data is larger than RAM \r\n \r\n\t\xc2\xa0 \r\n \r\n\tRead more: Link - Better MongoDB Performance | Tokutek \r\n \r\n\tHyperleveldb - a faster version of leveldb \r\n \r\n\t Inside HyperLevelDB :: Hacking, Distributed \r\n \r\n\tArdb \r\n \r\n\tArdb, uses Redis protocol for accessing some fast databases, mostly leveldb. \r\n \r\n\t Ardb is a BSD licensed, redis-protocol compatible persistent storage server, it support different storage engines. Currently LevelDB/KyotoCabinet/LMDB are supported, but only LevelDB engine is well tested. \r\n\t\xc2\xa0 \r\n \r\n\tKDr2/redis \r\n \r\n\tAnother one that does the same for leveldb only: \r\n \r\n\t KDr2/redis-leveldb \xc2\xb7 GitHub \r\n \r\n\t\xc2\xa0 \r\n ' >>> res[0].getObject().UID <bound method ATDocument.UID of <ATDocument at /site/index_html/Link---Better-MongoDB-Performance---Tokutek>> >>> res[0].getObject().UID() '0b933c2f07cb4e81a36b410429fe4e50' >>> docs = [(doc.getObject().UID(), doc.getObject().SearchableText()) for doc in res]