More test data please...
by
, 22-Jan-2014 at 09:44 AM (18166 Views)
This blog covers a little project I recently have been working on to create a larger and more diverse data set for the order entry workspace delivered as one of the examples. The set of data in the database delivered with version 17.1 was already increased and changed. This was mainly done to get a better set of data for the demonstration of Dynamic AI integration in a DataFlex application. The data in older versions of the workspace was a bit outdated and limited in size. While the increase was done technically correct, it did not show very well in a couple of Visual Report Writer reports I recently made. A report showing which customers are ordering a specific article (see Interactive Report in Windows with Visual Report Writer) showed a repetitive pattern that looked awkward. To get better test data I started creating an order sample generator view. Each time you will run this, different orders and order detail lines will be created. The view to create this will be included as part of the Specialized Components workspace in the next revision. This blog describes what issues played a role and you should be able to extract information to create a similar component for create more sample data in your own applications. More test data is vital for knowing if the application behaves well especially on areas as filters (constraints). Unless you are working on a copy of the data provided by a customer you will be testing with a limited set of data, no developer likes to key in really big amounts of test data.
The order entry example
The example is nice for showing some programming techniques. There are 6 tables (customer, salesp, orderhea, orderdtl, invt and vendor). The amount of data however, is quite small. There are only 47 articles on sale, there is a stunning 9 vendors that create and/or sell these articles, and there are 100 customers which are served by 28 salespersons. I don't know what company can afford to have that many salespersons to serve this low number of customers, but OK. There is no orders processing, no billing
OTOH, if you take the AdventureWorks (2000) example database from Microsoft you will see similar issues. The number of customers (individuals or shops) is higher but the sales are so low that it is impossible to keep the business running. The shop sells black bikes but if you look at the picture of that bike you discover it is green and the same for red bikes and so on. There is a table with product reviews but no reviews have been written. None of the employees ever got paid or switched from department. Their vendor table consist of 104 vendors. The currency table shows currency codes that are no longer available and still updated after expiration date. The order entry example isn't that bad if you compare.
Running totals
The standard order entry system maintains a running total value for the order total, the number of articles on stock, the sum of purchases for a customer and the balance due. This is all enforced by business rules in the data dictionaries. Upon creation, deletion or editing of an order detail row the amount of articles on stock, the order total and the customer balance due and purchases are updated in the back-out and update events of the order-header and order-detail data dictionary classes.
While it is nice to see the amount of inventory items on stock decrease by creating a new order detail line this won't work during automatic order creation. Either the product runs out of stock, generating a validation error (validate_save) or you would need to have a tremendous number of articles or stock before creating the orders. Therefore the business process object I created in the example disables the update of the on_hand value in the inventory table during update and back-out. This means the on_hand should be seen as frozen value, a timeline snapshot value. Deleting or changing individual orders will increase the stock but it will never run out of stock due to the automatic order creation process.
The running total for the order (order_total) is just fine. It is the sum of the order detail extended_price column values. Note that this column is usually not present in a SQL based system as it is redundant. The value can be calculated over a SUM() instruction. If the value was not a running total but a sales value (which can be identical to the sum) it would be present. Imagine the sum of the lines to be € 2001,23, salespersons will round this down to 2000 or even 1999 to make the order more attractive. Anyway in the order entry system the value is a running total and automatically maintained by the DD business rules, fine.
The two other running total values, at customer level, are more of a problem. The purchases value would 'never' be a grand total value of all orders for this customer over all the years, instead it would be the value for the current year, a YTD value so to say. The way it is currently constructed in the DD makes the value too unrealistic if the system contains orders over multiple years. Of course it is nice to see that customer 1 (Access Miles) has purchase worth € 205,939.00 over all orders in the system but it makes no sense in the order entry system as it is present. Many of these purchases could have been done 5 years ago and the customer is not such good customer as it looks like based on the total purchases. Purchase figures like this can afford the 28 salesperson in the company. A report showing the profit made on the sales should be able to tell us this. Where the on_hand running total could be kept as frozen the generator program cannot do this for purchases and balance due as they will get negative The size of the running total columns turned out to be too small when creating a large amount of orders. Especially the purchases and balance due columns in the customer table are too small (numeric 6.2). In your own environment you need to pay attention to these kind of columns. Make them large enough to hold bigger sets of data.
Inventory used in order detail
This is well done in the order entry system. An article (from INVT) has a list price and this is copied into order detail row prior to save the row. The user can change the price per unit and make it a more or a less profitable sale. However, it is impossible to see if the difference between unit price from the inventory table and the used price in the order detail table is caused by a price change of the article or a salesperson that changed the price for this sale. For that the system would need to have an inventory history table.
The generator view
The orders generator view functionality can be split into collecting information for creating the orders and the process of creating the data rows.
For the initial version of the user interface I decided that it was enough to collect the number of order rows to create, whether existing data should be deleted and being able to enter the date range (from and to dates).
The processing code is written inside a BusinessProcess object containing data dictionary objects for all tables of order entry workspace. A BusinessProcess object suits well for this as it isolates the processes, makes logging possible and keep the user up-to-date with the progress.
Inside the OnProcess event I started using the Random() function to select a date from the available date range and a random selection of customers, shipping methods, terms and sales persons.
While in the real world most orders will be placed during business days (Monday-Friday) the random date routine did not supportthat. Yes, the world is rapidly moving into the use of web order (cloud) systems over physical stores and those systems operate 24/7 where customers can place orders during the weekend and nights but the standard order entry example does not have individuals as customer, only companies and thus an order during a weekday makes more sense.
The same could be said about sales persons. If the system randomly picks a salesperson for a new order you will see a nearly equal sales result for each of the available salespersons while in the real world some salespersons will sell more than others. And what about the inventory items? Some articles will be more attractive and should sell more often than others. Using purely the random() function would statistically get a more flat set of articles sold.
To avoid a too boring set of data it was decided to add a weight feature to the system and the operator creating the data can decide how often a salesperson, a shipvia code, a terms code, a customer, a product or a weekday will appear in the to-be-created order set. The generator is even extended to make periodic sales tweak-able, some quarters will generate more sales than others. The weight works as follows: Normally each customer, each product appears once in an array to pick from. By applying a weight the selected products, salespersons, etc., are added more than once to the array resulting in a higher frequency of that value. The array of customers to pick from is extended by the weight on the customer's state (FL, NY, CA etc.).
Another feature to create a more interesting data profile is a correction on the credit limit. It is a bit silly to see a credit limit of 1000 while the average order is higher than that. Decided was to create a routine that scans the orders made by a customer, sum the order totals per quarter and use the highest value of that for its credit limit. Look this up in the code for AdjustCreditLimit.
The value for balance due gets increased for an order but should be decreased when a payment is received. Payments, billing, is not part of the order entry example application. To avoid that the value is identical to the amount of purchases the order generator contains a correction routine which, again randomly, reduces the balance due value. This introduces the risk that the balance due gets negative when orders are deleted but let's not worry about that fornow. The operator can enter the random range in a percentage value. So, reduce the balance due with a random percentage between 0 and a maximum.
The generator view contains two BusinessProcess objects; one for deleting existing orders and one for creating. This is done to simplify the code. During normal deletion the order total, the balance due and the purchases are re-calculated but the system does not need this. In fact the delete process first tries to delete the data via a zerofile statement and if that fails a row-by-row delete.
The grid with weekdays and their weights show the localized name of the weekday according to the Windows API. I used the code I published in the Retrieve weekday names in current language setting in Windows knowledge-base article from March 2000.
Remember the values
All the values that can be entered to generate the orders are stored to an XML file when the view closes. The values are restored either when the control for entering the value is added to the focus tree or when the process needs the values. If there is no XML file the controls fill themselves with data from the tables (inclusive validation tables), taking random weight values. In the code look for the methods ReadFromXMLData and StoreToXMLData and in each grid for Activating. The XML file is maintained via the FleXML classes in the DataFlex framework. After a save, the code I published to make XML human readable is executed.
The XML file is stored in one of the workspace datapath folders. A test is done whether the folder does exist before attempting to use that folder for the XML filename. The PathIsDirectory Windows API function is used for this testing. Look this up in the OrdersGeneratorSettings function for this.
Normally views store their size and location in the Windows registry when the program is stopped and restore this information when it is started. The generator view does not do this by setting the pbDisableSaveEnvironment to true. The view is resizable with a minimum and maximum size set during activation with values based on the design time size of the view component. Look for the code in the activating event.
Business rules can bite!
After a first run we found out that the system never created orders for one particular customer. By turning on the Display_Error_State it turned out that this was caused by an invalid e-mail address for this customer, which was tested during the save validation process. It is nice to show saves can be blocked during data entry but for an orders generator this had to be turned off.
Correcting the balance due forced another business rule to be changed. The value is normally a running total (calculated during backout and update) and thus DD_DISPLAYONLY was set to true. DD_DISPLAYONLY is a combination of DD_NOENTER and DD_NOPUT which means that any value moved into the column with a Field_Changed_Value operation never gets stored to disk. The oCustomer_DD object in the BusinessProcess objects remove the DD_DISPLAYONLY flag from the balance due and purchases columns.
To avoid that the value for On_Hand in the INVT table gets too low (or negative) the Adjust_Balances routine in the oOrderDtl_DD needed to be adjusted to let it change the order total but not update the On_Hand value. The code in the DDO does not and should not perform a Forward Send. During the delete process the Backout and Update events are cancelled.
Use of self defined classes
If you scan the source code of this component you will not find any user defined classes. While I normally would say you should create classes I decided not to do this for this example to not scare you off too much. The amount of code is already big (almost 4000 lines of source code). While adding classes would reduce the lines of code a bit I expect it would not improve you understanding the code. The classes created would only be useful for this module, anyway.
Techniques
The orders generator view demonstrates the use of several techniques. XML is used for storage of the values for the order generation. BusinessProcess objects with DDO's are used to remove existing and create new orders. Arrays of several data-types (string, integer, variant, struct) are used. Status- and error logging by a BusinessProcess is shown. Grids with limited data entry are used as well as a grid with free data entry (ordered by names grid) are available. Explanation to the user via multi-line textbox objects and tooltips are shown. Windows API functions and COM objects are used. Enabling and disabling of business and validation rules can be discovered.
Reports
The generated data can be used in Visual Report Writer reports as well as Dynamic AI business intelligence reports. Enclosed you will find a couple of Visual Report Writer 3.0 reports (version 3.0) which will show nice results when more data is generated than in the standard order entry data. So, generate data and play with the reports. Note that before you can run the reports you need to change the data-source location inside Visual Report Writer designer as my test data might be in a different location.
The database should be a SQL database to be able to use the test data in Dynamic AI. There are two ways to get the order data to SQL. The first is to convert the order entry system to an SQL database using one of the DataFlex connectivity kits and the Studio. The second very interessant option, is to use the Visual DataPump to convert the DataFlex data to Microsoft SQL. This tool can be used to do run once but more interesting is a scheduled conversion of your own data.