Uploaded image for project: 'Mail Archive Application'
  1. Mail Archive Application
  2. XMAILARCH-47 Improve loading session possibilities
  3. XMAILARCH-49

Allow loading of all mails from a source in sequence, even if split over multiple "sessions"

    XMLWordPrintable

Details

    Description

      Currently the only options to specify which mails to load during a loading session from a specific source, is to read only emails without "read" flag, or to read all emails, and to focus on "recent" emails or not.

      If the need is to completely dump an existing email account, or a very big mbox file, you will want to read all emails. The first run will load say the first 10, but how it's done further runs will try to load again and again the same 10 emails ... Unless you put all emails as "unread", and let the loader flag them as "read" when loaded.
      The latter works well but only from an email account (for an mbox file you may not be able to update the read flag, or not very easily), and it is "intrusive" (ie you loose or your currently read/unread status from this account).

      Instead of a true/false option for read/unread, it should be possible to select a "mode" for loading emails:

      • complete dump
      • only unread
      • complete dump then only unread

      The "dump" mode supposes that we are able to keep where we finished in previous run. Using message numbers for that is not a good idea, because message numbers are only valid for a session, and not always.
      So it is better to persist (in the LoadingSession for example, or maybe better in the loading job status - but maybe not, job status is valid only for one job execution and we have to persist on multiple executions) the Message-ID of the last message that was loaded, and the (ordered) list of folders that emails were loaded from. The last folder from this list, should be the one containing the last message we have read. Obviously these values should not be updated by user, or only by an advanced user knowing what he wants to do. This information (last message-id, list of folders, list of sources) should be stored for each loading session.
      At start we open last treated folder, and do a search of given Message-Id, and retrieve its number. We also retrieve the messageCount. Then we retrieve all messages from number of last message-id, to messageCount.
      We now load messages, either up to max number specified, either during specified duration.
      Regarding this maxNb, if we are in "full dump" mode, we read sources in sequence, up to maxNb. If we are in "only unread", we read up to maxNb from each defined source. It is up to the user to avoid "starving", ie that the last folder of the last source may never be read, if there is not enough time or maxNb emails to reach it. If a source is very active compared to others, it should not be defined in the same loading session.
      If the last treated folder, is not configured anymore in the source, it is marked as "DONE", and we go to next folder.
      If a folder already treated is not anymore in the source configuration, it is not removed from the list. We loaded emails from it anyway.
      If there are no folders anymore to treat for a source, we mark the source as "done".
      If there are no sources anymore to treat for a LoadingSession, dump is completed.
      If mode is only "complete dump", we could see if possible to stop the related scheduler job. In "complete dump" mode, if new emails appear in a folder already treated, then they are not loaded (there is no "second phase").
      If mode is "complete dump then only unread", we should just keep status info about the complete dump, then we update the loading session to "only unread" mode (easiest way).

      Attachments

        Activity

          People

            jbousque Jeremie BOUSQUET
            jbousque Jeremie BOUSQUET
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: