You are here: Working with Projects and Datasets > Types of Datasets > Web > Datasets: Google Harvest

Datasets: Google Harvest

Video tutorial available.

Follow the basic steps 1-3 for creating a new dataset, selecting Google Harvest, and then click Next. The Dataset Wizard window displays:
Dataset Wizard Google Harvest

  1. The default dataset name is the time and date of the Google harvest. You may edit the dataset name.
  2. From the Google Site drop-down, select which type of Google search you want to make: All of the Web, Google News, or Google Scholar.
  3. Compose a Google search using the options provided. These will be combined using the Boolean AND, i.e., all conditions must be satisfied for there to be a hit. For example, if you type "hike hiking walk walking" in the field "with at least one of the words" and type "dog" in the field "without the words", Google search will look for documents containing "hike", "hiking", "walk" or "walking" that do not also contain "dog". To save this search, check Save these Dataset Settings, and click Next >. The Advanced Google Options panel appears. If you have checked Save these dataset settings, it will be preceded by a dialog that allows you to name the settings. In future, you will be able to repeat the harvest by creating a new dataset with these settings.
  4. Advanced Google Options
  5. The defaults for Language, Date, and Occurrences are shown in the above example. You are not required to change them, although you may want to hone your query by specifying: When you are done, either:
    1. Click Next and go to step 8. Optional Settings, or
    2. To start processing using the default settings, go to step 10.
  6. These settings serve as controls for the duration of a harvest and can be useful if you are experiencing any of the following problems:
    Google harvest settings
  7. When you are done, click Next >, and the Filters panel displays.
    Google harvest filters
    Filters address the following problems:
  8. Click Finish to use the default settings for the remainder of the options and start processing immediately.

Start Processing

The Processing dialog opens, informing you that the dataset is being processed. Click OK. The dataset name opens in the list of datasets in the Dataset Editor window. You can monitor its status as it is processed by clicking the Refresh button at the top of the Dataset Editor window.

Check the Status of the Google Harvest

While the harvest is ongoing, you can monitor its progress in the Harvest Progress window:


The stages of dataset creation are reflected in the Status column of the Dataset Editor window. To see the complete status details, click the Status Dataset Editor Status Buttonbutton. The Dataset Details window opens.

Dataset Details Window

Use the tabs to view status information about the dataset harvesting, preprocessing, and processing phases of the Google harvest. This information can be used to refine your harvest.

of