Friday, July 28, 2017

PEGA Data Pages



Understanding data pages

Data pages (known previous to Pega 7 as "declare pages" and "declarative pages") store data that the system needs to populate work item properties for calculations or for other processes. When the system references a data page, the data page either creates an instance of itself on the clipboard and loads the required data in it for the system to use, or responds to the reference with an existing instance of itself.
Data pages obtain the data from external sources by connectors, from Report Definitions drawing on the PRPC database, or from other sources; and may use data transforms to make the data fully available where it is needed.
The name of a data page starts with the prefixes D_ or Declare_; on the clipboard, the contents of data page instances are visible, but are read-only.
The Data Explorer in the Designer Studio lists all the data pages available to your application. There you can quickly add additional data pages and data object types (classes).

Concepts

Node and Thread scope in data pages
Access Groups for data pages
Conditional refresh strategy for data pages
Comparing data pages with other clipboard pages
Data virtualization in PRPC
Enjoy improved PRPC performance with more intelligent data sourcing

Creating and using data pages

Manage your application's data object types and data pages with the Data Explorer
How to create a data page (declare page)
Naming data pages
Use parameters when referencing a data page to get the right data
Manage multiple data sources for a data page
After loading data, automatically call an activity to process it

Comparing data pages with other clipboard pages

Data pages (known previous to PRPC 7.1 as "declare pages") have many similarities with other clipboard pages. They are accessed the same way, so it is not necessary to write a Java step to access the data on a data page. Data pages also hold the same kinds of data as regular page: properties, embedded pages, and so on.
However, there are important differences between data pages and other clipboard pages.
  • Clipboard Location – Declare Pages appear in the Data Pages (version 7.1) or the Declared Pages (versions 5.1-6.3 SP2) area of the clipboard, and not under pxThread or pxRequestor pages.
  • Read only – Data pages are read-only; you cannot add or remove data after they are created.
  • Naming convention – The names of data pages must begin with the string "D_" or “Declare_” (version 7.1) or "Declare_" (versions 5.1-6.3 SP1). Other types of pages cannot begin with these strings.
  • Creation – A data page is automatically created whenever any properties on the page are accessed, if the page is not already present. You do not have to explicitly create these pages by using the Page-New method or other methods.
  • Update procedure – Data pages can have an automatic refresh schedule, which ensures that their contents are up-to-date.
  • Not saved to the database – Unlike other pages (such as work item pages), data pages are not saved to the PegaRULES database.   
  • Passivation – When a requestor is passivated to the database, all of that user’s information is serialized and temporarily saved to the database. If this user's clipboard contains any Thread-scope data pages, those pages are not saved to the database. Instead, the system deletes these pages as it passivates the requestor, then recreates them whenever they are next referenced by that requestor (after the requestors re-activated).

Data virtualization in PRPC

You may need to connect a data page of one object type to a data source of another, incompatible type. Pega 7 lets you do this quickly and relatively simply with a data transform, enabling data virtualization.
In data virtualization (http://en.wikipedia.org/wiki/Data_virtualization), the logical application data model is decoupled from the physical integration data models (the data sources). The data page serves as the singular reference point throughout the application, joining application data models and integration data models.
Changing, adding, or removing an integration point that a data page is sourced from means that you only have to modify, add, or remove a mapping data transform.
In this example, the data page is of one type and the data source is of another, incompatible type.
You can identify a response data transform of the same type as the data page to map the data from the data source to the data page, making it usable for the application:
The data transform acts automatically on each reference to the data page using that data source, mapping the data in the manner you specify:
A data page can have multiple data sources that it uses in different circumstances, depending on the parameters it receives with each reference. See Manage multiple data sources for a data page.
Regardless of which data source the situation requires, the data page maps the data it receives to the one common application data model.


How Pega 7 manages data pages
Data pages store data that the system needs to populate work item properties for calculations or for other processes. Developers don't have to create, populate, or otherwise manage data page instances.
When the system references a data page, the data page either creates an instance of itself on the clipboard and loads the required data in it for the system to use, or responds to the reference with an existing instance of itself. For a general introduction, see Understanding data pages.
The system manages data pages based on a combination of settings and circumstances as outlined below. The system automatically deletes, or prunes, data pages that are no longer needed, or when the maximum number of data pages is reached.

New data pages

To create a new data page, see How to create a data page.
The system creates a new data page instance when the data page is referenced, or uses an existing instance, depending on the settings in the Refresh Strategy section of the rule form's Load Management tab. See Data pages - conditional refresh strategy.

Pruning data pages

The system removes, or prunes, data pages in several circumstances.

Set single use

You can instruct the system to delete any existing data page instances and to create a new instance every time the data page is referenced. On the Load Management tab, check the Limit to a single data page check box:

With the check box checked, each time the system references the data page, it removes any existing instance and uses submitted parameters to create a new data page instance.
If the check box is not checked, the system creates a new instance of the data page for each reference with unique parameter values, and this can cause the number of stored instances of the data page to build rapidly.
This option is useful for parameterized data pages. See Use parameters when referencing a data page to get the right data.

Specify clearing unused pages

You can instruct the system to removed unused data page instances by checking the Clear pages after non-use check box on the data page rule's Load Management tab. This option is selected by default:

This setting has different effects depending on the scope of the data page:
  • Thread scope — This setting has no effect.
  • Requestor scope — This setting applies to both editable and read-only data page instances. The system creates a requestor scoped instance when any thread refers to the data page and other threads can also reference the same data page instance. If the check box is checked, the data page instance is removed when there are no more threads referring to it.
  • Node scope — This setting applies to read-only data pages. If the check box is checked, the system checks the Reload if older than fields on the Load Management tab of the data page rule:

    The system uses any setting in these fields. If the fields are blank, the system uses the value of the dynamic system setting DeclarePages/DefaultIdleTimeSeconds, which is set by default to 86400 seconds, or one day. If you wish, you can adjust the dynamic system setting's value.

The number of data page instances for a container reaches the set limit

By default, Pega 7 can maintain 1000 read-only unique instances of a data page per thread. You can change this value by editing the dynamic system setting datapages/mrucapacity.
There are different data page instance containers for the thread, requestor, and node level. Each user can have both requestor-level and thread-level data pages up to the limit established. Additionally, each node can have any number of requestors, and each requestor can have many threads. See Contrasting PRThread objects and Java threads.
For each container:
  • If the number of instances of a data page reaches 60% of the established limit for thread- or requestor-level containers, or 80% of the established limit for node-level containers, the system begins deleting older instances.
  • If the number reaches the established limit, the system deletes all data page instances that were last accessed more than ten minutes previously for that container.
  • If, after that step, the number of instances of the data page still exceeds the set limit, the system tolerates an overload up to 125% of the established limit for thread- or requestor-level containers, or 120% of the established limit for node-level containers.
  • If the number exceeds that overload number, the system deletes instances (irrespective of when they were last accessed) until the number of entries in the cache is below the set limit.
The data page creates new instances as needed to respond to references and to replace the deleted instances.
As the count of data page instances approaches the limit, the system displays the PEGA0016 alert. See Understanding the PEGA0016 alert - Cache reduced to target size.
This pruning behavior is always active. You can opt to have either or both of the first two methods active for any data page.

Forcing removal of data pages

You can also force removal of data page instances, without regard to the settings described above. To do this, use one of these options:
  • In Designer Studio, click the Clear Data Page button on the rule form.
    Click this button on the Load Management tab to clear all read-only instances of the data page from the clipboard according to their scope:
    • Thread-scoped pages — the system removes all instances of the data page from all threads of the current requestor.
    • Requestor-scoped pages — the system removes all instances of the data page from the current requestor.
    • Node-scoped pages — the system removes all instances of the data page from all nodes in the cluster.
  • In an activity, use the Page-Remove step with the data page as the step page. This method deletes read-only and editable data page instances regardless of the scope, as long as the data page is accessible by the thread that runs the activity.
  • Use the ExpireDeclarativePage rule utility function that takes the data page name as a parameter to delete read-only, non-parameterized data page instances:
    • For Thread-scoped data pages, the system removes data page instances from the current thread of the requestor.
    • For Requestor-scoped data pages, the system removes data page instances from the current requestor.
    • For Node-scoped data pages, the system removes data page instances from all nodes in the cluster.

Passivation

Passivation allows the state of a Java object — such as an entire Pega 7 PRThread context, including the clipboard state — to be saved to a file. A later operation, known as activation, restores the object.
Pega 7 uses standard passivation in general operation, but you can also configure passivation to shared storage in highly available environments. When all or part of a requestor clipboard is idle for an extended period and available JVM memory is limited, Pega 7 automatically saves clipboard pages in a disk file on the server. This frees up JVM memory for use by active requestors. (Typically, such passivation occurs only on systems supporting 50 or more simultaneous requestors.)
The system passivates editable data page instances, but discards read-only data page instances.
For more about passivation, see Creating a custom passivation method.


Use parameters when referencing a data page to get the right data

 

Summary

Data pages provide quick, accurate access to the data your application needs, when it needs it. Calling data pages with parameter values lets the data page provide exactly the data required for a situation, from the most appropriate data source, on demand. The system waits until a user action or some other trigger causes a data request, and then loads the data automatically.
Data pages transform the raw data received from a data source into data the application needs and can use.
An application that uses data pages, and that passes parameter values to them to get the right data to the right place, can build a responsive and rich structure without creating a lot of data pages, activities, and other code. Because PRPC supports multiple instances of the same data page, you can use the same design-time definition for multiple contexts simultaneously, within the same or different threads, without affecting other instances that have been loaded into memory. For frequently changing data page references, it’s possible to reuse an instance of a data page that’s already in memory. There is no need to hard-code and maintain references to data sources.

Data Page parameters

Data pages have a Parameters tab where you can specify the parameters the system can use when referencing that data page.

Make sure the names are descriptive, so you and other developers can see easily what sort of values they expect.
For each parameter you can set its type, whether it is required, and other settings. Setting a parameter to required, for example, changes the way the data page provides data to an auto-populated property (see below). An auto-populated property only attempts to load the data page that is supposed to provide its data when the required parameters for that data page have values. On the other hand, if there are no required parameters for the data page, an auto-populated property references the data page immediately, as soon as one of the listed (optional) parameters is set.
The data page uses the parameters on the Definition tab in two main ways:
  • Embedded auto-populate properties. For hierarchical data relationships and contextual referencing, embedded auto-populate properties is the easiest and preferred way to automatically access and source data from multiple hierarchically related data pages.
  • Parameterized data pages. When you don't need to maintain hierarchical relationships or case context, you can directly refer to data pages with parameters. See the next section.

Passing parameters to data pages using a property

Properties can reference data pages and send parameter values using a section of the property's General tab:

The data page returns to the property (in this case, a single page) information related to the customer whose CustomerID the property referenced in the "Parameters" section. The asterisk beside the field label indicates that a value for this parameter is required when referencing this data page.

Passing parameters to data pages

Data pages can provide parameterized data to many other PRPC elements besides properties. Any of the following can reference a data page with parameters and get back data appropriate to its situation and requirements:
Activity
Ant script
Batch
Case Match
Collection
Constraint
Data Transform
Decision Map
Decision Table
Decision Tree
Declare Expression
Function Alias
Infer
Interaction
Property Alias
Scorecard
Strategy
When

How to pass parameters directly to a data page

Each instance of a data page on the clipboard has a fully-qualified name, such as D_Customer_pa6671911977865993pz. Do not use the fully-qualified name when referencing the data page. Instead, use the name of the data page itself and add the value for any parameters. The data page then determines (see below) whether to load a new instance of itself onto the clipboard to respond to the reference, or to refer to the correct instance already on the clipboard.
The syntax is: <data page name>[<comma delimited list of parameter name:value pairs>]
You can refer to a data page with one of several valid forms of this syntax. For D_Customer, you could load or refer to the data page using any of these:
  • D_Customer[CustomerID:“ANTON”,CompanyName:"BigCo"]
  • D_Customer[CustomerID:.CustID]
  • D_Customer[CustomerID:param.CustID]
When the data page only has one parameter, you don't have to specify the parameter name. You only need to specify the value:
  • D_Customer[“ANTON”]
  • D_Customer[.CustID]
  • D_Customer[param.CustID]
See How to create a data page (declare page) and Tailor data pages to the context in which you use them.

How a data page responds to a reference with parameters

When PRPC references a data page with parameters, the data page checks whether it can use an existing instance of itself on the clipboard, or needs to load a new instance:

If there is no instance of the data page on the clipboard, or if the refresh strategy defined on the data page requires loading a fresh instance, or if the parameters passed do not match the parameters used to create an existing instance on the clipboard, then the data page loads a new instance of itself onto the clipboard with the parameterized data that the current call requires.
Each time the data page finds it can respond with an existing instance of itself already in the clipboard, PRPC saves time and effort by not having to go back to the database for data.

 

Manage multiple data sources for a data page

Data pages can have multiple data sources, and you can set rules that determine which data source to use in a given situation so the application gets the right data every time.
At a high level, when the application invokes the data page, it can send values for one or more data page parameters. The data page can use the parameter values to select which of its data sources to use to respond to the current call, and which data from that data source to return.
Calling data pages with parameter values simplifies design and development and promotes code reuse, since a single data page can serve as a hub, quickly assembling and delivering the right set of data for a wide range of calls. Using parameters eliminates the labor and maintenance cost of creating and maintaining hard-coded calls for data.
To make use of multiple data sources, you need to do the following on the data page:

Specify more than one data source

On the Definition tab of a data page, you can specify one or many data sources.

If you specify multiple sources, a field appears where you identify the When condition (see below) that evaluates whether the current reference to the data page requires using the source identified in each row. The condition for the final data source is set as "Otherwise": that data source is used if all the preceding When conditions evaluate to false.
Click "Add New Source" to define an additional data source.
If there is more than one source listed, you can delete all but one. Click the "X" at the right of the information about the source you want to delete, to remove it.
You can drag data sources higher or lower in the list to set the order in which the system checks whether they should be used.
See How to create a data page (declare page) and the help documentation for details about what information to provide in each field.

Specify parameters

When the system references a data page, it can pass one or more parameters that the data page can use to select exactly the data the system requires. Set those parameters on the Parameters tab.

See Use parameters when referencing a data page to get the right data.

Create a When condition

The data page uses the parameter values the referencing page submits, and a When condition, to determine what data to send back. In the example below, the When condition checks whether the value of the submitted parameter searchProvider is "Northwind".
If it is, the data page provides (and appropriately transforms so the application can use it) data drawn from the Northwind data source.
With these steps, the data page is prepared to respond to references sent with parameter values by returning data designed for the needs of each reference.
The data page can create a separate instance of itself on the clipboard for each time it is referenced with unique parameter values, or restrict the number of data page instances to one, so each new references overwrites the data page instance on the clipboard