0

我正在开发一个 GWT 过滤器以使我的 GWT 应用程序可抓取。这个想法是,当它发现这样一个丑陋的 URL 时:

http://www.myapp.com/?_escaped_fragment_=v;id=Mv67mC13Yizr

介绍一个好的:

http://www.myapp.com/#!v;id=Mv67mC13Yizr

但是,代码永远不会到达 doFilter()。为什么?

Web.xml

<filter>
    <filter-name>guiceFilter</filter-name>
    <filter-class>com.google.inject.servlet.GuiceFilter</filter-class>
</filter>

<filter-mapping>
    <filter-name>guiceFilter</filter-name>
    <url-pattern>/*</url-pattern>
</filter-mapping>

DispatchServletModule.java

public class DispatchServletModule extends ServletModule {

    @Override
    public void configureServlets() {
        serve("/" + ActionImpl.DEFAULT_SERVICE_NAME)
                .with(DispatchServiceImpl.class);
        filter("/").through(CrawlerServiceImpl.class);
    }
}

CrawlerServiceImpl.java

    @Singleton
    public final class CrawlerServiceImpl implements Filter {
      private static final String ESCAPED_FRAGMENT_FORMAT1 = "_escaped_fragment_=";
      private  final int ESCAPED_FRAGMENT_LENGTH1 = ESCAPED_FRAGMENT_FORMAT1.length();
      private static final String ESCAPED_FRAGMENT_FORMAT2 = "&"+ESCAPED_FRAGMENT_FORMAT1;
      private  final int ESCAPED_FRAGMENT_LENGTH2 = ESCAPED_FRAGMENT_FORMAT2.length();

      @Inject(optional = true)
      private final Provider<WebClient> webClientProvider = null;

      @Override
      public void init(FilterConfig filterConfig) throws ServletException {  
      }
      @Override
      public void destroy() {
      }

      @Override
      public void doFilter(ServletRequest request, ServletResponse response,
          FilterChain chain) throws IOException, ServletException {
        HttpServletRequest req = (HttpServletRequest) request;
        HttpServletResponse res = (HttpServletResponse) response;
        String queryString = req.getQueryString();

        final String requestURI = req.getRequestURI();
        if ((queryString != null) && (queryString.contains(ESCAPED_FRAGMENT_FORMAT1))) {
          try {
            StringBuilder pageNameSb = new StringBuilder("http://");
            pageNameSb.append(req.getServerName());
            if (req.getServerPort() != 0) {
              pageNameSb.append(":");
              pageNameSb.append(req.getServerPort());
            }
            pageNameSb.append(requestURI);
            queryString = rewriteQueryString(queryString);
            pageNameSb.append(queryString);
            String pageName = pageNameSb.toString();
            WebClient webClient;
            if( webClientProvider == null )
              webClient = new WebClient(BrowserVersion.FIREFOX_3_6);
            else
              webClient = webClientProvider.get();

            webClient.setThrowExceptionOnScriptError(false);
            webClient.setJavaScriptEnabled(true);
            HtmlPage page = webClient.getPage( pageName );

            res.setContentType("text/html;charset=UTF-8");
            PrintWriter out = res.getWriter();
            out.println("<hr />");
            out.println("<center><h3>You are viewing a non-interactive page that is intended for the crawler.  "
                + "You probably want to see this page: <a href=\""
                + pageName
                + "\">"
                + pageName + "</a></h3></center>");
            out.println("<hr />");     
            out.println(page.asXml());
            webClient.closeAllWindows();
            out.println("");
            out.close();
          }
          catch( Exception e ) {
          }
        } else {
          chain.doFilter(request, response);
        }
      }

      private String rewriteQueryString(String queryString) throws UnsupportedEncodingException {
        int index = queryString.indexOf(ESCAPED_FRAGMENT_FORMAT2);
        int length = ESCAPED_FRAGMENT_LENGTH2;
        if (index == -1) {
          index = queryString.indexOf(ESCAPED_FRAGMENT_FORMAT1);
          length = ESCAPED_FRAGMENT_LENGTH1;
        }
        if (index != -1) {
          StringBuilder queryStringSb = new StringBuilder();
          if (index > 0) {
            queryStringSb.append("?");
            queryStringSb.append(queryString.substring(0, index));
          }
          queryStringSb.append("#!");
          queryStringSb.append(URLDecoder.decode(queryString.substring(index
              + length, queryString.length()), "UTF-8"));
          return queryStringSb.toString();
        }
        return queryString;
      }
}
4

2 回答 2

0

Guice 采用所有过滤器。要添加过滤器,您需要在 guice servlet 模块中声明它:

filter("/?_escaped_fragment_=*").through(CrawlerServiceImpl.class);

于 2013-04-11T09:35:32.033 回答
0

Your<url-pattern>无效,*只允许作为模式的/*后缀或*.前缀;并且模式仅适用于路径,而不适用于查询字符串。

您必须将您的过滤器映射到参数/的过滤器检查_escaped_fragment_(我会亲自检查,然后使用)来决定是使用您getMethod()在服务器端获取和呈现页面,还是简单地链接到下一个过滤器。"GET"getParameter("_escaped_fragment_")WebClient

请注意,当您在您的过滤器中声明web.xml它不会被 Guice 注入时,所以就像Dvd Prd所说的那样,您可能宁愿在您的 Guice 中声明过滤器ServletModule。请注意,与标准映射类似,只有路径匹配,因此上述内容仍然适用(甚至 afilterRegex()也行不通)。

于 2013-04-11T11:26:45.340 回答