Integrating Non-Plone Content

 

Integrating Non-Plone Content

Integrating Non-Plone Content

Having problems? Check the troubleshooting checklist or the common tasks in Enfold Proxy or the FAQ.

Includes and Excludes

Includes and Excludes allows you to tailor which URLs are to be processed by Enfold Proxy (EP) and which are to be processed by Internet Information Services (IIS). Using Includes and Excludes lets you accomplish these things without causing URL conflicts. This is useful for several scenarios:

  • an application already existed in IIS before you installed Plone.
  • you have directories of static content from previous websites which need to be included under the same domain.
  • you wish to install a separate non-Plone application and place it inside a directory inside the domain for your Plone application.
  • you wish to wrap a Plone site inside a nonPlone site.
  • You need to keep existing nonPlone content at the root of your domain, but also let Plone and Enfold Proxy serve most of the content.

To add Excludes or Includes, open the EP configuration tool, choose your proxy definition and select the tab for Excludes or Includes .

You use Excludes when you want your Plone site to use ALL of the domain except the things you have explicitly excluded. You use Includes when you want your Plone site to use NONE of the domain except the files or directories you have explicitly included.

Important: Includes and Excludes should not be used as a way to prevent access to parts of sites. In fact, these options simply block people typing the www URL; they do not block access for those typing in a port and IP address. To harden your websites, it's more effective to set up security within the Plone application itself.

Integrating Non-Plone Content: Testing and Deployment

  1. Create a test IIS site with www.fakedomain.com (a site only for testing).
  2. Verify that you can access the server root of www.fakedomain.com (by setting DNS or editing your HOST file).
  3. Add the web application into this IIS site. Steps may vary depending on the application, but you basically need to add a directory (real or virtual) into IIS. This directory should correspond to the directory (i.e., URL path) you wish to use when you integrate it with your Plone site.
  4. Create proxy definition (s) to allow Plone to be served for the same web domain. (In your Proxy Definition, select the IIS site for fakedomain (in the Site tab).
  5. Add excludes and includes to the appropriate Proxy Definition(s) as needed.
  6. After you have verified that everything works with the IIS site that serves www.fakedomain.com, you can go live simply by modifying your Proxy Definition(s) so that they stop using the IIS site for www.fakedomain.com and start using www.realdomain.com .

Using Excludes

Usually, Enfold Proxy handles all the traffic to a host or domain, but what if you wish to prevent Plone from handling a certain directory or set of files? The solution is to use Excludes.

The Exclude tab of your proxy definition lets you specify Exclude directories or specific files from being handled by Plone. Suppose you need to integrate a java application (let's call it "java application") into your Plone site. Your end goal is to have http://www.originalfunsite.com/java_application/ go to your Java application, while every other directory is handled by Plone/Enfold.

Any listed names in this section must match the initial path specification of the request exactly - case is sensitive and substrings are not matched.

Assumptions: you have a Plone site which is located at the root of http://www.originalfunsite.com . (See adding a proxy definition to learn how to do that).

  1. Open IIS and confirm that the IIS site has a directory (or virtual directory) for your non-Plone application. If you don't have an index.htm or index.html file, create one. (Right click the IIS site --> New --> Virtual Directory).

    images/excludes-java-application2.png
  2. Verify that this directory can be accessed when the proxy definition for the domain or host is inactive. (If you are dealing with a live site, you may need to use a fakedomain to test this (See Integrating Non-Plone Content: Testing and Deployment ).

  3. Edit the Exclude tab of your proxy definition. Add the directory java_application (without a slash at the end) to the field Local Excludes.

  4. After pressing save, type http://www.originalfunsite.com/java_application/ in your browser. You should see whatever is in the index.html file for your java_application directory.

Testing and Troubleshooting: If you are having problems, you can run a command line utility eep_check utility with a --show-excludes argument to print a list of all items found by this auto-exclude processing. Also, you can run the Check utility to see if it reports any error messages. See also the troubleshooting checklist.

Note: When you list something in the Exclude field, you cannot use a slash (/). For that reason, you are not able to exclude deep folders (i.e., paths with more than one level: /java_application/small_applet/ ).

Excludes: Additional Details

Excludes are case-sensitive.

Any listed names in this section must match the initial path specification of the request exactly - substrings are not matched (although case is ignored). The names are matched after any Local Host Root parts have been removed. The names listed must not include any slashes - only the name portion must be listed. As a special case, you may specify a forward slash (/) to indicate the root of the site. Note that specifying the root will not automatically specify any default pages in that root (such as index.html) - they need to be explicitly specified if required.

For example, if your Local Host Root is /Plone and you specify excludes as MyDir, then any requests to /Plone/MyDir, /Plone/MyDir/my_file, /Plone/mydir etc will be excluded (i.e., ignored by the proxy and handed by IIS) - but requests to /Plone/MyDir2, /Plone/Other/MyDir etc will all be processed normally. In this example, requests to /MyDir would not be processed by EP at all, as it falls outside our Local Host Root.

If you require more flexibility (for example, exclude MyDir wherever it appears in the URL), see excludes_regex below.

Using Auto-Excludes to Maintaining Existing URLs

The main function of auto-excludes is to maintain existing URLs from IIS after you add a Plone site. Suppose you want Plone to handle most of a domain, but you have legacy content still at the root that need to work. Auto-excludes lets EP check first if IIS content already exists for that URL; if it does, IIS will serve it; otherwise, Plone/Enfold will handle it.

Note: Be careful that existing/legacy IIS content is not preventing people from accessing important parts of the Plone site!

To set up an Auto-Exclude, select your proxy definition and choose the Excludes tab. If you use Auto-Excludes, you have to choose what kinds of web content will cause IIS to ignore Plone. (If you want an Auto-Exclude, more than likely you will want to choose All). The dropdown box for Auto-exclude shows these options:

  • Choosing (All) will check to see if there are any virtual directories, file system directories or files for a URL under the IIS site. If yes, then the IIS site will handle it. If no, then Enfold/Plone will handle it. This is probably the most common choice if you plan to set up auto-excludes.
  • Choosing (webdirs) will only exclude virtual directories specified within the IIS site. A virtual directory does not physically exist in the file system underneath the IIS root. Instead it may refer to an application or a directory in another location (analogous to a Windows short cut).
  • Choosing (fsdir) will only exclude file directories on the file system existing underneath the root for the IIS site. These directories actually appear in Windows Explorer (and not merely IIS).
  • Choosing (files) will tell IIS to serve files inside IIS document root instead of Plone whenever such a file or files exist.
  • Choosing (None) will deactivate all auto-excludes and just cause Enfold Proxy to assume that all HTTP requests should be handled by Plone.

If you are using auto-excludes, you do not need to enter anything under Local Excludes (although you can). After you press Save, the changes should take effect immediately. If not, you may need to restart your IIS site.

excludes_regex

excludes_regex is very similar to excludes - except for two critical differences:

  • Each entry specifies a regular expression rather than a literal string.
  • Each regular expression is matched against the entire child URL being matched - not just the first portion.

The case of the URL is ignored when matching (this is the same behavior as excludes).

For example, if you wished to exclude all .jpg files from the plone site (thereby causing all requests for such images to be handled by IIS itself), you could specify .*\.jpg (which will match any URL ending in .jpg). If you only wanted to exclude .jpg files from the root of the proxied site, you could specify [^/]*\.jpg (where the second regex fails to match if a slash character appears in the requested URL).

Note: The ordering of the proxy definitions in the ep.ini file does not affect how or whether EP will process them. You can optionally use excludes_regex for more fine-grained control over which proxy definition acts upon a particular URL.

Please refer to the Python regular expressions reference for information on regular expression patterns.

Using Includes

The main rationale for using Includes is to hide most of a Plone site except for one or more selected directories or one or more URLs. This can be useful if a certain directory is a domain is using a Plone application for a site which is otherwise almost completely non-Plone.

Keep in mind that you may be able to do this more simply by modifying the Virtual host root and Local host root in your proxy definition. For example, suppose you have an IIS site and you have a directory in Plone (http://192.168.1.150:8080/Plone/salesdirectory/) which you wish to be called up whenever someone types www.originalfunsite.com/salesdirectory/ . Rather than putting 'salesdirectory' in EP's include field for your proxy definition, it's more direct to try this instead:

  • Local Host Root: /salesdirectory
  • Virtual Host Root: /Plone/salesdirectory
  • Virtual Hosts: 192.168.1.150:8080

includes and includes_regex

By default, if you leave Includes and includes_regex blank, Enfold Proxy will serve the entire site in a normal way. Once you specify anything in the Includes, the proxy definition will exclude everything except the directories and files which you have explicitly declared on this tab.

This means that in most cases, you will not use both include settings and exclude settings at the same time in a proxy definition:

  • If your requirement was to proxy an entire site apart from a few predefined URLs (i.e., all requests go to Plone except for a few predefined IIS applications), you would specify these IIS application roots in 'excludes' (and this is the reason that case is ignored for excludes - IIS itself ignores the case)
  • If your requirement is to only proxy a few predefined URLs (i.e., just one or 2 pages from your Plone/Zope site, and the rest by IIS), then you would specify those URLs in includes (and this is why case is sensitive for includes - Plone/Zope is also sensitive to the case)

The includes_regex feature works in a similar way to excludes_regex. (See Above).

Look at the example in the screenshot below. When the user types www.originalfunsite.com, the user goes to IIS (and not Plone). But www.originalfunsite.com/news will go to the Plone site. Why are the includes_regex necessary? Because the plone .css is located at http://originalfunsite.com/portal_css/Plone%20Default/ploneStyles1199.css, this css would not be included unless specifically named. If you didn't include these regular expressions, you would instead see a "stripped version" of the original site (without .css, images, etc). This simple regular expression allows www.originalfunsite.com/news to use the accompanying .css stylesheets (and javascript files, and image files) that are not underneath the news/ directory. When trying this technique, remember the possibility of having collisions between css, js and image files which are identically named in Plone and in IIS.

images/include_regex.png

Using Simple Rewrite_Mode to Proxy Non-Plone Sites

Note: this is an advanced feature for proxying non-Plone applications. See the limitations below. Most Plone system administrators should only need to have VHM selected.

On your proxy definition, you have an option in Misc --> Rewrite mode. There are two modes: VHM and Simple.

Virtual Host Monster (VHM) is a feature specific to Zope/Plone to translate URLs. Zope will then ensure all URLs served by Zope are relative to the front-end proxy machine. This Virtual Host Monster allows a proxy to request URLs in a special format that includes information about the proxy. The requests look like this:

http://127.0.0.1:8080/VirtualHostBase/http/%{SERVER_NAME}:80/VirtualHostRoot/_vh_zopesite/

Enfold Proxy has another rewriting mode that is more generic and not specific to Zope/Plone. To change Rewrite mode from VHM to Simple, select your proxy definition and choose the Misc. tab.

To illustrate how simple rewrite_mode works (and see its limitations in action), let us create a new website in IIS called testidiot. Host header = wwww.testidiot.com . Change the hosts file so that www.testidiot.com resolves to your local machine (See Verifying that the IIS host resolves correctly).

Next, create a proxy definition called testidiot. Use these settings:

[host testidiot]
rewrite_mode = simple
Virtual Hosts = www.microsoft.com:80
Local host root = /
Virtual host root = /

After saving, typing www.testidiot.com, you will see

images/simplemodemicrosoft.png

In other words, by adding an external website to your proxy definition's virtual host, you can proxy this external website into your own web domain (and make it appear like your own). This example works at the time of this writing (this may not always be true), but it still should work for simple html sites and sites which do not automatically redirect requests to a domain's home page.

The same thing will work if we change the Virtual Host to www.python.org:80 . (You may need to clear the browser cache before doing this). But what happens if we change the Local Host Root to something different? Is it possible to make the python home page appear on www.testidiot.com/python instead of www.testidiot.com ?

If the virtual host were a Plone site, you would simply need to change Local Host Root to be /python and everything would work fine. But that is not what happens when the Virtual Host is www.python.org:80 . In the specific case of www.python.org, if you proxy it to www.testidiot.com/python/ , you may see text for the home page, but other resources (such as CSS) will not appear. Also, if you click on a python.org navigation link (such as to News http://www.python.org/news ), the hyperlink goes to www.testidiot.com/news rather than www.testidiot.com/python/news.

This reveals one of the limitations of using Simple rewrite_mode to proxy web servers. EP can proxy an external website to the root of an IIS domain or host, but it cannot proxy to a subdirectory of a domain on IIS. Therefore, links in content served by the remote host may not be correct, unless the proxied server always serves up 'relative' URLs, or is specifically configured for its URLs to be correct with respect to a front-end proxy.

As a result of these limitations, simple proxy mode is useful only for servers that can be specifically configured with knowledge of a front-end proxy. The django server is an example of such a server.

Caching and simple rewrite_mode

Assuming that 1) the server is able to serve relative URLs correct for the proxy AND 2)the server sets the cache headers appropriately, then Enfold Proxy will cache the content. To determine whether the cache headers are set appropriately, consult an online cacheability engine.