Details

    • Type: Bug Bug
    • Status: Closed Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 1.2 RC1
    • Fix Version/s: 2.0 M1
    • Component/s: XEclipse
    • Labels:
      None
    • Environment:
      XWiki Eclipse 1.2 RC1, XWiki Enterprise 2.0.2, MySQL 5.1.40, Tomcat 6.0.20, Java 1.6.0, Debian GNU/Linux 5.0
    • keywords:
      patch
    • Similar issues:

      Description

      1. Start XEclipse.

      2. Create a new page in the sandbox:

      File
      -> New page
      -> Space: Sandbox
      -> Name: loremipsum
      -> Title: Lorem Ipsum
      -> Finish
      

      3. Copy the contents of loremipsum.txt and paste it with CTRL+V into the new page.

      4. XEclipse is blocked for 154 seconds.

      5. Save the new page with CTRL+S.

      6. XEclipse is blocked for 160 seconds.

      1. Improvement_XECLIPSE-144.txt
        13 kB
        Venkatesh Nandakumar
      2. loremipsum.txt
        15 kB
        Clemens Fuchslocher
      3. XECLIPSE-144-20091128-1.patch
        14 kB
        Clemens Fuchslocher
      4. XECLIPSE-144-20091128-2.patch
        14 kB
        Clemens Fuchslocher
      5. XECLIPSE-144-20091128-3.patch
        15 kB
        Clemens Fuchslocher
      6. XECLIPSE-144-20091129-1.patch
        16 kB
        Clemens Fuchslocher

        Issue Links

          Activity

          Clemens Fuchslocher made changes -
          Field Original Value New Value
          Attachment loremipsum.txt [ 16201 ]
          Hide
          Clemens Fuchslocher added a comment -

          Same results with Windows XP SP3 and VirtualBox 3.0.4.

          Show
          Clemens Fuchslocher added a comment - Same results with Windows XP SP3 and VirtualBox 3.0.4.
          Venkatesh Nandakumar made changes -
          Assignee Venkatesh Nandakumar [ vnkatesh ]
          Venkatesh Nandakumar made changes -
          Status Open [ 1 ] In Progress [ 3 ]
          Hide
          Venkatesh Nandakumar added a comment - - edited

          This one is tough.

          Bug lies here: http://svn.xwiki.org/svnroot/xwiki/contrib/sandbox/xeclipse/org.xwiki.eclipse.ui/src/main/java/org/xwiki/eclipse/ui/editors/scanners/XWikiMarkupScanner.java

          The (multiple instances of ) RegExRule takes too much time, but there is no better way to do it. SingleLineRule, PatternRule fails in many cases and is simply not good enough.

          Just interchanging the rules, (pushing down the RegExRule below does some good, but not in this specific case, since "lorem impsum" does not satisfy any rule and all the rules have to be checked)

          Still checking if there are any other alternatives. Any ideas?

          Show
          Venkatesh Nandakumar added a comment - - edited This one is tough. Bug lies here: http://svn.xwiki.org/svnroot/xwiki/contrib/sandbox/xeclipse/org.xwiki.eclipse.ui/src/main/java/org/xwiki/eclipse/ui/editors/scanners/XWikiMarkupScanner.java The (multiple instances of ) RegExRule takes too much time, but there is no better way to do it. SingleLineRule, PatternRule fails in many cases and is simply not good enough. Just interchanging the rules, (pushing down the RegExRule below does some good, but not in this specific case, since "lorem impsum" does not satisfy any rule and all the rules have to be checked) Still checking if there are any other alternatives. Any ideas?
          Hide
          Sergiu Dumitriu added a comment -

          The regular expressions could be optimized, making them use possessive quantifiers.

          Show
          Sergiu Dumitriu added a comment - The regular expressions could be optimized, making them use possessive quantifiers.
          Hide
          Sergiu Dumitriu added a comment -

          Venkatesh, could you try replacing expressions like:

          new RegExRule("1 .*\n?", heading1Token)
          

          with

          new RegExRule("^1 .*+$", heading1Token)
          

          and see if it works better?

          Also, I think the proper expression would be:

          new RegExRule("^[ \t]*+1[ \t]*+.*+$", heading1Token)
          

          since spaces and tabs are allowed at the start of the line. Apply a similar change to all the other regex rules if it works.

          Show
          Sergiu Dumitriu added a comment - Venkatesh, could you try replacing expressions like: new RegExRule("1 .*\n?", heading1Token) with new RegExRule("^1 .*+$", heading1Token) and see if it works better? Also, I think the proper expression would be: new RegExRule("^[ \t]*+1[ \t]*+.*+$", heading1Token) since spaces and tabs are allowed at the start of the line. Apply a similar change to all the other regex rules if it works.
          Hide
          Fabio Mancinelli added a comment -

          Not directly related to this issue by maybe worth to be discussed...

          I did some experiments using the rendering engine in order to reuse the information extracted by the renderer to perform the syntax highlighting. There are two problems that makes this approach inapplicable :

          1) Text offsets are not retrievable from the rendering API. So apparently there is no way for understanding where a given block starts and ends (at least I haven't been successful in finding one) And this information is needed by the Eclipse text framework.

          2) The "transformation" source_xwiki2 -> XDOM -> source_xwiki2 is not an identity. There is some additional logic that "sanitizes" the source once it's rendered. For example the source "*a ##b* c##", once rendered becomes "*a ##b##* ##c##" or something like that. So even if we would be able to understand block's offsets, those offsets would be relative to a source that is not the original one.

          Another idea I had was to reuse the low level grammar (http://bit.ly/6KLjkK) in order to build a parser that could parse the editor's content and give directly usable offsets. Even in this case I found some problems

          1) The grammar has embedded logic for generating parsing events (onXYZ, beginXYZ, endXYZ) and most of that logic is useless in our context. It would be nice to have a cleaned up grammar with only the specification in order to plug our "offset finding" logic more easily. However this is not trivial
          2) Javacc doesn't provide linear offsets, but each token has begin and end information represented with line and column. Nothing that's transcendent but quite annoying.
          3) Maybe there are performance problems, because the parser has to reparse everything at each keystroke (unless it is incremental)

          Anyway this solution implies the maintaining of a parallel version of the javacc grammars that should be kept in synch with the main one.

          Since it seems that there in no alternative to the need of maintaining another grammar, the third idea is that it would be worth to use XText (http://www.eclipse.org/Xtext/). XText is a framework for writing DSLs, and it is able to automatically generate the editor for the specified language. So basically, once we specify the XWiki grammar we will have for free the editor with syntax highlighting, completion etc. However I don't know how much flexible is the generated code and how complex is to extend it (for example, we might be willing to plug the groovy highlighter when we are inside a groovy macro block)

          Of course we should rewrite the XWiki grammar using XText's grammar definition language (http://www.eclipse.org/Xtext/documentation/latest/xtext.html#grammarLanguage) which might be difficult.

          Show
          Fabio Mancinelli added a comment - Not directly related to this issue by maybe worth to be discussed... I did some experiments using the rendering engine in order to reuse the information extracted by the renderer to perform the syntax highlighting. There are two problems that makes this approach inapplicable : 1) Text offsets are not retrievable from the rendering API. So apparently there is no way for understanding where a given block starts and ends (at least I haven't been successful in finding one) And this information is needed by the Eclipse text framework. 2) The "transformation" source_xwiki2 -> XDOM -> source_xwiki2 is not an identity. There is some additional logic that "sanitizes" the source once it's rendered. For example the source "* a ##b * c##", once rendered becomes "* a ##b## * ##c##" or something like that. So even if we would be able to understand block's offsets, those offsets would be relative to a source that is not the original one. Another idea I had was to reuse the low level grammar ( http://bit.ly/6KLjkK ) in order to build a parser that could parse the editor's content and give directly usable offsets. Even in this case I found some problems 1) The grammar has embedded logic for generating parsing events (onXYZ, beginXYZ, endXYZ) and most of that logic is useless in our context. It would be nice to have a cleaned up grammar with only the specification in order to plug our "offset finding" logic more easily. However this is not trivial 2) Javacc doesn't provide linear offsets, but each token has begin and end information represented with line and column. Nothing that's transcendent but quite annoying. 3) Maybe there are performance problems, because the parser has to reparse everything at each keystroke (unless it is incremental) Anyway this solution implies the maintaining of a parallel version of the javacc grammars that should be kept in synch with the main one. Since it seems that there in no alternative to the need of maintaining another grammar, the third idea is that it would be worth to use XText ( http://www.eclipse.org/Xtext/ ). XText is a framework for writing DSLs, and it is able to automatically generate the editor for the specified language. So basically, once we specify the XWiki grammar we will have for free the editor with syntax highlighting, completion etc. However I don't know how much flexible is the generated code and how complex is to extend it (for example, we might be willing to plug the groovy highlighter when we are inside a groovy macro block) Of course we should rewrite the XWiki grammar using XText's grammar definition language ( http://www.eclipse.org/Xtext/documentation/latest/xtext.html#grammarLanguage ) which might be difficult.
          Hide
          Thomas Mortagne added a comment - - edited

          IMO we should introduce offset information in rendering api, it add a very useful feature to the API for debug and it make a lot easier to do code highlight for XEclipse but also to have wiki syntax support in the code macro for example.

          Show
          Thomas Mortagne added a comment - - edited IMO we should introduce offset information in rendering api, it add a very useful feature to the API for debug and it make a lot easier to do code highlight for XEclipse but also to have wiki syntax support in the code macro for example.
          Thomas Mortagne made changes -
          Link This issue relates to XWIKI-4472 [ XWIKI-4472 ]
          Hide
          Venkatesh Nandakumar added a comment - - edited

          Sergiu: The problem was not with the regular expressions itself, but it was at CharacterScannerCharSequence.java:unread(int n) function, which was called repeatedly at RegExRule.java:evaluate() function, and this was inefficient. The different expressions that you suggested did not improve the running time.

          The patch provided, does not fully iron out the problem. It improves RegExRule.java to call scanner.unread() minimally. I got almost a 250% improvement while pasting http://dev.xwiki.org/xwiki/bin/view/Community/BuildingInEclipse?viewer=code&showlinenumbers=0 [10 Seconds -> 4 Seconds]

          Lorem Ipsum is the worst case scenario, and it still blocks while pasting. (Was not able to test running time on my low end laptop). Can anyone try and see if it works better with this patch?

          Using another implementation of RegExRule.java, http://bit.ly/4CNluu did not help.

          As of now, I think nothing can be done to improve this.

          Show
          Venkatesh Nandakumar added a comment - - edited Sergiu: The problem was not with the regular expressions itself, but it was at CharacterScannerCharSequence.java:unread(int n) function, which was called repeatedly at RegExRule.java:evaluate() function, and this was inefficient. The different expressions that you suggested did not improve the running time. The patch provided, does not fully iron out the problem. It improves RegExRule.java to call scanner.unread() minimally. I got almost a 250% improvement while pasting http://dev.xwiki.org/xwiki/bin/view/Community/BuildingInEclipse?viewer=code&showlinenumbers=0 [10 Seconds -> 4 Seconds] Lorem Ipsum is the worst case scenario, and it still blocks while pasting. (Was not able to test running time on my low end laptop). Can anyone try and see if it works better with this patch? Using another implementation of RegExRule.java, http://bit.ly/4CNluu did not help. As of now, I think nothing can be done to improve this.
          Venkatesh Nandakumar made changes -
          Attachment Improvement_XECLIPSE-144.txt [ 16259 ]
          Hide
          Clemens Fuchslocher added a comment -

          Without the patch:

          • Building XWiki in Eclipse: 3 Seconds
          • Lorem Ipsum: 142 Seconds

          With the patch:

          • Building XWiki in Eclipse: 2 Seconds
          • Lorem Ipsum: 60 Seconds
          Show
          Clemens Fuchslocher added a comment - Without the patch: Building XWiki in Eclipse: 3 Seconds Lorem Ipsum: 142 Seconds With the patch: Building XWiki in Eclipse: 2 Seconds Lorem Ipsum: 60 Seconds
          Hide
          Clemens Fuchslocher added a comment -

          1. If I use xwiki-eclipse-rcp-linux-gtk-x86-1.2-rc-1.zip, then pasting loremipsum.txt tooks 140 seconds.

          2. If I build the trunk version of XEclipse RCP with Maven, then loremipsum.txt tooks 140 seconds without Improvement_XECLIPSE-144.txt, and 60 seconds with it.

          3. If I start the trunk version of XEclipse RCP from within Eclipse, then loremipsum.txt tooks 50 seconds without the patch, and 20 seconds with the patch.

          I don't know why there is such a huge difference between the 2. and 3. case. In both cases, the same versions of Eclipse 3.4.0 and Java 1.6.0 are shown in the configuration details.

          Show
          Clemens Fuchslocher added a comment - 1. If I use xwiki-eclipse-rcp-linux-gtk-x86-1.2-rc-1.zip, then pasting loremipsum.txt tooks 140 seconds. 2. If I build the trunk version of XEclipse RCP with Maven , then loremipsum.txt tooks 140 seconds without Improvement_XECLIPSE-144.txt , and 60 seconds with it. 3. If I start the trunk version of XEclipse RCP from within Eclipse , then loremipsum.txt tooks 50 seconds without the patch, and 20 seconds with the patch. I don't know why there is such a huge difference between the 2. and 3. case. In both cases, the same versions of Eclipse 3.4.0 and Java 1.6.0 are shown in the configuration details.
          Hide
          Clemens Fuchslocher added a comment - - edited

          With XECLIPSE-144-20091128-1.patch the parsing time of loremipsum.txt dropped down to a second.

          The patch was created against the current trunk: http://svn.xwiki.org/svnroot/xwiki/xeclipse/trunk/plugins/org.xwiki.eclipse.ui/.

          I compared the syntax highlighting for the following documents with the one from xwiki-eclipse-rcp-linux-gtk-x86-1.2-rc-1.zip. The parsing time was always under a second. No difference in look.

          Only tested under Linux.

          The line separator handling should also be tested under Windows and Mac OS X.

          Show
          Clemens Fuchslocher added a comment - - edited With XECLIPSE-144-20091128-1.patch the parsing time of loremipsum.txt dropped down to a second. The patch was created against the current trunk: http://svn.xwiki.org/svnroot/xwiki/xeclipse/trunk/plugins/org.xwiki.eclipse.ui/ . I compared the syntax highlighting for the following documents with the one from xwiki-eclipse-rcp-linux-gtk-x86-1.2-rc-1.zip. The parsing time was always under a second. No difference in look. XWiki Syntaxes Building XWiki in Eclipse XWiki RESTful API Download XWiki Products Only tested under Linux. The line separator handling should also be tested under Windows and Mac OS X.
          Clemens Fuchslocher made changes -
          Attachment XECLIPSE-144-20091128-1.patch [ 16280 ]
          Hide
          Fabio Mancinelli added a comment -

          Clemens that's great. Thank you.
          I'll try to review the patch during the weekend and apply it!

          Show
          Fabio Mancinelli added a comment - Clemens that's great. Thank you. I'll try to review the patch during the weekend and apply it!
          Hide
          Clemens Fuchslocher added a comment -

          Here is an updated version of the patch: XECLIPSE-144-20091128-2.patch.

          A comment was fixed and two unnecessarily created StringBuffer objects were removed.

          Show
          Clemens Fuchslocher added a comment - Here is an updated version of the patch: XECLIPSE-144-20091128-2.patch . A comment was fixed and two unnecessarily created StringBuffer objects were removed.
          Clemens Fuchslocher made changes -
          Attachment XECLIPSE-144-20091128-2.patch [ 16282 ]
          Hide
          Clemens Fuchslocher added a comment -

          One more update: XECLIPSE-144-20091128-3.patch.

          • Added a rule for definition lists.
          • Fixed pattern for the level six header.
          • Renamed class HeadlineRule to HeaderRule.
          Show
          Clemens Fuchslocher added a comment - One more update: XECLIPSE-144-20091128-3.patch . Added a rule for definition lists. Fixed pattern for the level six header. Renamed class HeadlineRule to HeaderRule.
          Clemens Fuchslocher made changes -
          Attachment XECLIPSE-144-20091128-3.patch [ 16284 ]
          Hide
          Sergiu Dumitriu added a comment -

          OK, the idea of the patch looks good, the style looks good, but there are a few problems:

          • Why does DefinitionListRule extend HeaderRule? Semantically it doesn't make sense. You could extract a common MultiSyntaxRule, and either make the two rules descendants, or drop them completely, since they don't have any particularities.
          • patternToken is not a very good name for that map
          Show
          Sergiu Dumitriu added a comment - OK, the idea of the patch looks good, the style looks good, but there are a few problems: Why does DefinitionListRule extend HeaderRule? Semantically it doesn't make sense. You could extract a common MultiSyntaxRule, and either make the two rules descendants, or drop them completely, since they don't have any particularities. patternToken is not a very good name for that map
          Hide
          Clemens Fuchslocher added a comment -

          Why does DefinitionListRule extend HeaderRule? Semantically it doesn't make sense.

          You are right. With this inheritance I only reuse the evaluate method of HeaderRule. So it's only a technical choice.

          At first, I added the Constants.DEFINITION_TERM_PATTERN to the header rules in XWikiMarkupScanner:

          HeaderRule headers = new HeaderRule();
          ...
          headers.add(Constants.DEFINITION_TERM_PATTERN, definitionTermToken);
          

          But that would be far too dirty, because a definition list is not a header. So I introduced the DefinitionListRule to make this distinction clear. And then the inheritance was just handy.

          I can remove the inheritance and create a seperate evaluate method in DefinitionListRule. With this I also can save the unneeded while loop in that evaluate method, because there is only one definition list pattern to test for. This can also improve the performance a bit.

          patternToken is not a very good name for that map

          Would tokenLookup be OK? I lookup a token by its pattern.

          Show
          Clemens Fuchslocher added a comment - Why does DefinitionListRule extend HeaderRule? Semantically it doesn't make sense. You are right. With this inheritance I only reuse the evaluate method of HeaderRule. So it's only a technical choice. At first, I added the Constants.DEFINITION_TERM_PATTERN to the header rules in XWikiMarkupScanner: HeaderRule headers = new HeaderRule(); ... headers.add(Constants.DEFINITION_TERM_PATTERN, definitionTermToken); But that would be far too dirty, because a definition list is not a header. So I introduced the DefinitionListRule to make this distinction clear. And then the inheritance was just handy. I can remove the inheritance and create a seperate evaluate method in DefinitionListRule. With this I also can save the unneeded while loop in that evaluate method, because there is only one definition list pattern to test for. This can also improve the performance a bit. patternToken is not a very good name for that map Would tokenLookup be OK? I lookup a token by its pattern.
          Hide
          Clemens Fuchslocher added a comment -

          I can reimplement DefinitionListRule like this:

          public class DefinitionListRule extends ListRule
          {
              public DefinitionListRule(String regex, IToken token)
              {
                  super(regex, token);
              }
          
              @Override
              public IToken evaluate(ICharacterScanner scanner)
              {
                  IToken token = super.evaluate(scanner);
                  // Is there a definition list?
                  if (token == Token.UNDEFINED) {
                      // No.
                      return token;
                  }
          
                  // Yes. Consume the rest of the line.
                  readLine(scanner);
                  return token;
              }
          }
          
          • Semantic benefit: DefinitionListRule is a ListRule.
          • Technical benefit: DefinitionListRule reuses the evaluate method of ListRule without any code duplication.
          Show
          Clemens Fuchslocher added a comment - I can reimplement DefinitionListRule like this: public class DefinitionListRule extends ListRule { public DefinitionListRule(String regex, IToken token) { super(regex, token); } @Override public IToken evaluate(ICharacterScanner scanner) { IToken token = super.evaluate(scanner); // Is there a definition list? if (token == Token.UNDEFINED) { // No. return token; } // Yes. Consume the rest of the line. readLine(scanner); return token; } } Semantic benefit: DefinitionListRule is a ListRule. Technical benefit: DefinitionListRule reuses the evaluate method of ListRule without any code duplication.
          Hide
          Sergiu Dumitriu added a comment -

          Yes, sounds good. Could you prepare a final patch?

          Show
          Sergiu Dumitriu added a comment - Yes, sounds good. Could you prepare a final patch?
          Hide
          Clemens Fuchslocher added a comment -
          Show
          Clemens Fuchslocher added a comment - Please see XECLIPSE-144-20091129-1.patch .
          Clemens Fuchslocher made changes -
          Attachment XECLIPSE-144-20091129-1.patch [ 16285 ]
          Hide
          Vincent Massol added a comment -

          Venkatesh, are you still active on this (it's marked in progress)?

          Show
          Vincent Massol added a comment - Venkatesh, are you still active on this (it's marked in progress)?
          Vincent Massol made changes -
          Assignee Venkatesh Nandakumar [ vnkatesh ]
          Vincent Massol made changes -
          keywords patch
          Vincent Massol made changes -
          Fix Version/s 1.2 RC2 [ 12440 ]
          Vincent Massol made changes -
          Assignee Vincent Massol [ vmassol ]
          Vincent Massol made changes -
          Status In Progress [ 3 ] Open [ 1 ]
          Vincent Massol made changes -
          Assignee Vincent Massol [ vmassol ]
          Vincent Massol made changes -
          Fix Version/s 2.0 M1 [ 12542 ]
          Fix Version/s 1.2 RC2 [ 12440 ]
          Hide
          Fabio Mancinelli added a comment -

          Applied patch. Thanks Clemens (and I apologize for the huge delay)

          Fixed at https://github.com/xwiki/xwiki-eclipse/commit/8952bb5a74a7690d9df109ed1720635f4c8e4e42

          Show
          Fabio Mancinelli added a comment - Applied patch. Thanks Clemens (and I apologize for the huge delay) Fixed at https://github.com/xwiki/xwiki-eclipse/commit/8952bb5a74a7690d9df109ed1720635f4c8e4e42
          Fabio Mancinelli made changes -
          Status Open [ 1 ] Closed [ 6 ]
          Assignee Fabio Mancinelli [ fmancinelli ]
          Resolution Fixed [ 1 ]

            People

            • Assignee:
              Fabio Mancinelli
              Reporter:
              Clemens Fuchslocher
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:
                Date of First Response: