Accessing Data in a Dataset
To access the data inside of a dataset, each dataset has a few functions that can be called on to access different parts of the dataset. These are listed in the table below.
|Returns the column at the specified index as a list.|
|Returns the number of columns in the dataset.|
|Returns the index of the column with the name colName.|
|Returns the name of the column at the index colIndex.|
|Returns a list with the names of all the columns.|
|Returns the type of the column at the index.|
|Returns a list with the types of all the columns.|
|Returns the number of rows in the dataset.|
|Returns the value at the specified row and column indexes.|
|Returns the value at the specified row index and column name.|
Looping Through a Dataset
Oftentimes you need to loop through the items in a dataset similar to how you would loop through a list of items. You can use the functions above to do this.
Accessing Data in a PyDataset
PyDatasets are special in that they can be handled similarly to other Python sequences. Any dataset object can be converted to a PyDataset using the function system.dataset.toPyDataSet. All of the functions listed above can be used on a PyDataset, but the data can also be accessed much easier, similar to how you would a list.
Looping Through a PyDataset
Looping through a PyDataset is also a bit easier to do, working similar to other sequences. The first for loop will pull out each row, which acts like a list and can be used in a second for loop to extract the values.
Additionally, a single column of data can be extracted by looping through the PyDataset.
A PyRow is a row in a PyDataset. It works similarly to a Python list.
The examples and outputs are based on the results in the table below. In addition, "print" commands are used, but should be replaced by appropriate logging methods (such as system.util.getLogger) depending on the scope of the script.
Returns the index of first occurrence of the element. Returns a ValueError if the element isn't present in the list.
|count()||Calculates total occurrence of given element in the row.|
You can also have repeating elements in a row:
Altering a Dataset
Technically, you cannot alter a dataset. Datasets are immutable, meaning they cannot change. You can, however, create new datasets. To change a dataset, you really create a new one and then replace the old one with the new one. There are system functions that are available that can alter or manipulate datasets in other ways. Any of the functions in the system.dataset section can be used on datasets, the most common ones have been listed below:
The important thing to realize about all of these datasets is that, again, they do not actually alter the input dataset. They return a new dataset. You need to actually use that returned dataset to do anything useful.
For example, the following code is an example of the setValue function, and would change the population value for Los Angeles.