ORDS How-To 03: Upload a database

1. About this guide

This document is one of a series of How-To guides for the Online Research Database Service (ORDS).

Please note that the ORDS is now in the process of being wound down. From 1 June 2017, the service will cease to be supported by the University of Oxford. We therefore strongly recommend that the ORDS is not used for new projects. Please contact the Research Support team to discuss alternative options.

Contents

1.1. Pre-requisites

This How-To assumes that you have already registered to use the ORDS, and have already either created a project, or been given access to one as a project administrator.

For instructions on how to do this, see the earlier guide in this series, How to get started using the ORDS and How to create and manage projects.

2. Preparing data for upload to the ORDS

2.1. File formats compatible with the ORDS

Three types of file can be uploaded to the ORDS:

  • Files created using Microsoft Access (with either the .mdb or .accdb file extension)
  • PostgreSQL files
  • CSV (comma-separated values) files

Only a single Access or PostgreSQL file can be uploaded into each ORDS database. However, the ORDS allows the upload of multiple CSV files into the same database: the contents of each file will appear as a separate table in the database. (You can then use the ORDS schema designer to add relationships between the tables – see the later guide in this series, How to build and structure databases.)

If your data is in a single Access or PostgreSQL file, you do not need to do anything else to prepare your data for upload to the ORDS. See section 3 below for details of how to proceed.

If your data is in multiple Access or PostgreSQL files, you will need either to combine these into a single file of the same format, or to convert the files to CSV format before uploading.

If your data is in one or more CSV files, you will need to ensure that these are properly formatted before uploading. See section 2.2. below for details.

If your data is in another format, see section 2.3. below.

2.2. Preparing CSV files for upload

CSV is a file format which allows tabular data to be saved as plain text. Each row in the table is represented by a line in the CSV file, and the breaks between columns are represented by commas. For example, a simple table like this:

Index Name Age City
1 Fred 25 London
2 Karen 40 Manchester
3 Joan 33 Cardiff
4 Michael 55 Oxford

would be represented in a CSV file by:

Index, Name, Age, City
1, Fred, 25, London
2, Karen, 40, Manchester
3, Joan, 33, Cardiff
4, Michael, 55, Oxford

Before uploading a CSV file to the ORDS, it's important to ensure that it is properly formatted. Some key things to remember are listed below.

2.2.1. Each field needs a name

The first row of the file should contain a comma-separated list of the field (column) names, and no field should be left without a name.

Field names in ORDS can include spaces. However, these can sometimes cause problems in some other systems, so if you wish to be able to use data exported from the ORDS elsewhere, it may be safer to substitute underscores.

When you upload a CSV file to the ORDS, an extra field will be added to serve as the primary key for the table (see section 4.1.1. below). The name of this field will be the name of the file, followed by _index - so, for example, if the file is called data.csv, the primary key field will be called data_index. If there is already a field name which follows this pattern, ORDS will not be able to process the file (as this would involve creating a second field with the same name) and an error will result. To avoid this, rename the field.

2.2.2. Each row in the file should contain the same number of fields

If a CSV file does not contain a consistent number of fields throughout, ORDS will not be able to process it.

For example, this table would produce an error message, because record 3 contains only three fields, rather than (as in the rest of the table) four:

Index, Name, Age, City
1, Fred, 25, London
2, Karen, 40, Manchester
3, Joan, 33
4, Michael, 55, Oxford

Note, however, that this does not prevent fields from being empty, or containing a null value. However, the data needs to be properly formatted, to allow ORDS to recognize that this is what is intended. For example, in the sample table above, record number 3 should read:

3, Joan, 33,

The extra comma at the end indicates that the record concludes with an empty field. (Without the extra comma, there would be nothing to indicate to ORDS where in the record the empty field should come.)

(If the file has been created using a program such as Microsoft Excel and then saved in CSV format, empty fields should be formatted in this way automatically.)

2.2.3. The file should contain only the field names and the data

When storing data in a spreadsheet, it can sometimes be useful to add a title at the top of the sheet, to use blank rows or columns for spacing, or to add other non-data content (for example, explanatory notes or references at the end of the spreadsheet). Unfortunately, as these do not fit into the rigid confines of the CSV format, ORDS is unable to interpret them properly.

In some cases, this may result in an error when trying to upload the file. The CSV file should therefore consist only of the field names in the top row of the file, and the data in the second and subsequent rows.

If you wish to add notes or comments relating to an entire data table, or to a specific field within a table, you can add these after upload using the ORDS schema designer. See the later guide in this series, How to build and structure databases, for details.

2.3. Other file formats

If your data is in a format other than those listed in section 2.1. above, you will need to convert it to one of these before you can upload it. It is usually most straightforward to convert files to CSV format. (For a description of CSV files, see section 2.2. above.)

Because CSV uses only commas and line breaks to indicate the data structure, it is a very flexible format. Many different software applications can interpret information presented in this way, and hence it is often used as a way of moving data from one application to another.

This means that many pieces of software offer a Save as or Export option which will allow you to save your data as CSV files. You should, however, be aware that because CSV is a plain-text format, then formatting, relationships between data, and other features such as hyperlinks are likely to be lost when you do this. It is therefore usually a good idea to keep a copy of the original file for future reference. (Once the data is in the ORDS, it may be possible to recreate relationships between data using the ORDS schema designer - see the later guide in this series, How to build and structure databases. For advice on how best to achieve this, you can contact the ORDS help desk by emailing ords@it.ox.ac.uk.)

A step-by-step example of converting a Microsoft Excel spreadsheet to CSV format is given in the Appendix below.

Once you have saved your data in CSV format, you will need to check that the CSV files are properly formatted for upload - see section 2.2. above for details.

If you need further assistance preparing your data for upload to the ORDS, please contact the ORDS help desk by emailing ords@it.ox.ac.uk.

3. Creating a new empty database

3.1. Getting started

The first step is to create an empty database in the ORDS system.

Log in to the ORDS via https://app.ords.ox.ac.uk, You will see a list of your ORDS projects. Click the name of the project in which you would like to create the new database to be taken to the project page.

If the project does not currently contain any databases, you will see a message indicating this:

No databases message

If the project already contains databases, you will see a list of these, with the Add new ORDS database button above it.

Click the Add new ORDS database button.

3.2. The Add Database to Project form

You will now see the Add Database to Project form. This allows you to enter some basic information (or metadata) about the database you wish to create.

Add Database to Project form

Two fields in the form are mandatory: Name and Description.

The database Name can be whatever you choose, providing it is at least two characters long. However, brackets should be avoided in databases names, as these can sometimes cause a system error resulting in data being lost.

The Description is intended to provide a summary of key information about the database. The field can be used to describe the subject matter and scope of the database, and to supply other relevant information such as details of the people involved in creating or working on it, or how it relates to a larger project.

The database name and description will appear in the list of databases you and other project members see on the project page.

You can also supply a creation date for the database. This does not have to be the date on which you first add the database to the ORDS; if you are uploading a pre-existing database, it may have been created some time earlier.

You can type the date into the box directly, using the DD/MM/YYYY format. If you do not wish to specify the date this precisely, you can add just the month and year (MM/YYYY), or just a year (YYYY).

Alternatively, you can select a date using the calendar. To do this, click the small calendar icon at the end of the Created on field.

Created On field

A calendar will appear. Use the arrow keys on the top row to move to the appropriate month and year, then click the date you want. It will appear in the field.

Created On field showing calendar

The two final fields in the form are Data sources and Data gathering process. These allow you to document where the data initially came from, and how it was collected. The amount of information it is appropriate to include here will vary from project to project (and for some projects, one field may be more relevant than the other), but it is good practice to record this information somewhere. This is especially important if the database will be made public or shared with other researchers at some point. You will be able to edit or add to these fields later if you choose.

When you are happy with the contents of the form, click Add to create the database.

You will be taken back to the project page, and should see a message confirming that the database has successfully been created.

The name of the new database will now appear in the list of ORDS databases for the project. Note that because this is as yet an empty database, the number of database versions is currently zero.

Database list showing empty database

Click the database name to go to the database page.

4. Uploading an existing data collection

You can now upload your database or data collection into the empty ORDS database you have just created.

However, before doing so, it is worth being aware of how ORDS will handle some important aspects of your data - primary keys and table names.

4.1.1. Primary keys in the ORDS

Each table in a database needs to have a field known as a ‘primary key’. This field is used by the system as a means of uniquely identifying records within the table. The content of this field therefore needs to be unique for each record in the table.

To see why this is important, imagine a table that gives personal information about a group of people. Suppose that two people in the group have the same name – ‘Jane Smith’, for example. The database system needs a way of distinguishing between these; otherwise, when asked to retrieve information about Jane Smith, there would be no way of telling which of the two Janes this referred to. The easiest way to avoid this problem is to give each record a unique identifier which can serve as the primary key. This is often called an ‘ID’ or ‘index’ field.

If you are uploading Access or PostgreSQL files, each table will already have a primary key field, and the ORDS will simply use this.

If you are uploading CSV files, these may not have a field suitable to be a primary key. For each table created from a CSV file, ORDS therefore adds an extra field to serve as the primary key. The default name for this field is the name of the file uploaded, with _index appended at the end (you can change this name after upload if you choose – for instructions, see the later guide in this series, How to build and structure databases). As noted in section 2.2.1. above, this means the upload attempt will fail if a CSV file already contains a field whose name is the name of the file followed by _index: if you have such a field, you will need to rename it before uploading.

The primary key field created by ORDS will contain a unique numerical value for each record. It will be set as an autoincrement field, meaning that if you add new records to the table, ORDS will automatically complete this field for you.

4.1.2. Tables names in the ORDS

If you upload an Access or PostgreSQL file, ORDS will retain the existing table names. If you upload CSV files, ORDS will use the file names as the table names.

4.2. The database page

The database page displays your database’s details, and allows you to manage the database (see the later guide in this series, How to manage ORDS databases and database versions, for more details).

Database page before upload

ORDS allows you to create up to three versions of each database. Details of how to do this, and of what you might use each version for, can be found in the later guide, How to manage ORDS databases and database versions.

When you upload a database, it automatically becomes the main database version. In the lower left-hand part of the screen, you will see the Main database version information pane.

Click the Upload existing database button.

4.3. Uploading your file

You will be taken to the Upload a Database to ORDS page. This allows you to select a file to upload.

Upload a Database to ORDS page

Click the Browse... button. (The labelling of this button may vary depending on the browser you are using.)

In the dialogue box that opens, navigate to the file that you wish to upload, click the file name to select it, and then click Open. (The labelling of this button may also vary.)

File selection dialogue box

The file name will now appear on the Upload a Database to ORDS page.

Upload a Database to ORDS page, with file name highlighted

If you have accidentally selected the wrong file, you can change it by clicking Browse... again.

Once you are happy that the correct file has been selected, click Upload.

You will be returned to the project page, and should see a message confirming that the database will be uploaded. You should also receive a notification email telling you that your database has been sent to the ORDS servers for processing.

You will also see that the number of database versions displayed alongside the database name has changed from zero to one.

Database list showing one database version

If you click the database name to return to the database page, you should see a message confirming that the database is currently being uploaded. For small databases, the upload process should be fairly quick; for larger files, it may take longer.

Database version information panes during upload

You will also see that two additional panes have appeared in the lower part of the screen – one for Milestone database version information, and one for Test database version information. See the later guide in this series, How to manage ORDS databases and database versions, for more information about these.

When the upload process has completed, you will be sent an email notification confirming that the database has been successfully imported.

Once this has happened, a set of new buttons will appear in the database version information panes. (You may need to refresh your browser to see these.)

Database version information panes after file upload

You are now ready to start working with your data in the ORDS.

Alternatively, if the file you uploaded was in CSV format, you have the option of uploading further files to create additional tables.

4.4. Uploading multiple CSV files

If the file you have uploaded is a CSV file, you will see an additional option in the Main database version information pane: Upload another CSV file as a new table.

Database version information panes with Upload CSV link

Click this. You will be taken to the Upload a CSV File page.

Upload a CSV File page

Click Browse.... In the dialogue box that opens, navigate to the file that you wish to upload, click the file name to select it, and then click Open.

The file name will now appear on the Upload a CSV File page. Click Upload.

As with the first CSV file, you should see a message confirming that the data has been saved, and should also receive another pair of notification emails – one confirming that processing of your data has begun, and another confirming that it has been successfully imported into the ORDS.

Repeat the above process for all the CSV files you wish to upload.

You are now ready to start working with your data in the ORDS.

5. What next?

You may want to have a look at some of the later guides in this series. For a full list, visit the ORDS documentation home page.

You can also find out more about the ORDS service by visiting the ORDS home page, http://ords.ox.ac.uk/. If you have specific queries, you can contact the ORDS help desk by emailing ords@it.ox.ac.uk.

 

Appendix: Step-by-step example – saving Microsoft Excel files in CSV format

Microsoft Excel spreadsheets are widely used to store and organize tabular data. While Excel files cannot be uploaded directly to the ORDS, converting them to CSV files is quick and straightforward.

The instructions below use Excel 2010. Other versions of Excel may look slightly different, but the basic process will be similar.

Step 1 - Open the file in Excel

Begin by opening the file which contains the data you wish to upload to the ORDS. The example used here is a simple dataset consisting of two worksheets – these can be seen as tabs in the bottom left-hand corner of the window. Each worksheet will need to be saved as a separate CSV file.

Excel spreadsheet with multiple worksheets

If the first worksheet you wish to save as CSV is not already visible, select it by clicking the appropriate tab.

Step 2 - Save the first worksheet as a CSV file

To save the worksheet, click File (or in Microsoft Office 2007, the Office button) and then Save As.

The lower part of the dialogue box that appears includes fields labelled File name: and Save as type:

Excel Save As dialogue box

Change the file name to reflect the fact that you are saving a copy of just one sheet, rather than the workbook. This file consists of two worksheets, People and Places. As this is the People worksheet, People seems an appropriate file name. This name will also be used by ORDS as the name of the database table that will be created when this file is uploaded.

Next, click the small downward-pointing arrow at the right-hand end of the Save as type: field to view the list of available file types.

Excel Save As dialogue box showing file types

Select CSV (Comma delimited), then click Save.

If your Excel workbook contains multiple worksheets, you will see a message warning you that the file type does not support this.

Excel warning message re multiple sheets

This simply means that (as already noted) the worksheets need to be saved in separate files. Click OK to save the sheet which is currently active.

You will then see a second message warning you that the file may contain features which are not compatible with the CSV format. These features may include formatting (italics, bold, use of colour, etc.), Excel formulas, hyperlinks, and anything else which is not straightforward textual or numerical data. These features will be stripped out when the file is saved as CSV. (Note, though, that a copy of the original file will also remain on your system.)

Excel warning message re features

Click Yes.

The file you are now looking at is the CSV copy. However, this may not be immediately obvious, as Excel will not change the file’s appearance (to remove formatting and so forth) until after you close it and reopen it.

Step 3 - Close and reopen the file to view CSV formatting

(This step is optional, and is simply to allow you to see the difference made by saving a file in CSV format. If you prefer, you can proceed immediately to Step 4 below.)

When you close the file, Excel will ask if you wish to save the changes you have made. Rather irritatingly, if you click Save, Excel will take you through the sequence of warnings listed above for a second time. You can either work through the warnings again, or you can click Don't save – as long as you have not made any changes to the data since you last saved it, you will not lose anything if you do this.

If you wish to reopen the file using Excel, you may find that it is not immediately visible in the list displayed in the Open dialogue box. This is because by default, Excel shows only Excel files. You can change this using the file types drop-down list in the bottom right-hand corner of the dialogue box.

Excel file types drop-down list

 

Change the selected option from All Excel Files to All Files, and your CSV file should appear in the list. You can now reopen the file.

When you do this, you will see that Excel has stripped out the formatting, leaving just the data in plain text form. Also notice that the file now only has one worksheet, rather than two.

Excel spreadsheet saved in CSV format

Important note: Excel files beginning with ID

If the contents of the first cell in your spreadsheet begin with ID (because, for example, your dataset includes an ID number for each record), you may see a warning message like the one below when you try to open it:

Excel spreadsheet file extension warning

This is because Excel automatically tries to interpret any text file beginning with ID as a SYLK file (this is a specialist file format used to exchange data between applications).

In this case, of course, the file is not actually a SYLK file. If you click Yes, you will therefore see a message like the following one:

Excel spreadsheet SYLK file warning

These two warning messages in fact cancel each other out: the first appears because Excel believes the file is a SYLK file, and the second when it discovers that it is not.

Click OK, and Excel should open the file as a CSV file without any further problems.

Step 4 - Saving the second and subsequent worksheets

To create CSV files for the remaining worksheets, you will need to reopen your original Excel file. Use the tabs in the bottom left-hand corner to select the next worksheet.

Repeat the process described in Step 2 above to create a CSV file of the data contained in this worksheet. (If you wish, you can also follow the process described in Step 3 to view the CSV version.)

If your spreadsheet contains more than two worksheets, repeat the process for each of the other worksheets.

And finally...

You should now have a collection of CSV files which can be uploaded into the ORDS to form the tables of a database.

If you need further advice about preparing your data for use in the ORDS, you can contact the ORDS help desk by emailing ords@it.ox.ac.uk.

Service area: 

Written by IT Services. Latest revision 31 August 2016