Demultiplexing and Aligning Sequencing Reads - NGS Prep Kit for sgRNA, shRNA, and DNA Barcode Libraries

Cellecta provides NGS Demultiplexing and Alignment Software for most of its libraries. The program demultiplexes, aligns, and scores Illumina platform sequence data generated from samples of Cellecta libraries prepared using the primers in Cellecta NGS Prep Kits (Cat.#s LNGS-100 through LNGS-999). Below are the instructions for use of this Software.

If you are not using the Cellecta software, please see the section Other Options Instead of Cellecta’s Alignment Software at the bottom of this page.

Program Requirements and Installation

The program runs on 64-bit Windows, such as Windows 7 or higher.

If the option to generate FASTQ files is enabled, sufficient local disk space to store the resulting output is required; this will approximately equal the combined disk space of the input FASTQ files.

For premade libraries, you may download the appropriate Library Configuration File(s) from the Cellecta website product pages.

If you cannot find the Library Configuration File(s) for your library, please contact Cellecta Technical Support at tech@cellecta.com. Please note that Library Configuration Files or links to files are also provided in “Cellecta Order Information” emails sent to customers upon shipment of libraries. For premade libraries, please visit the appropriate product page to download the Library Configuration File(s) under Alignment Software Supporting Files in the Product Information section.

To obtain the latest version of the software, click on the original link provided in the “Cellecta Order Information” email at the time of shipment. Please contact us at tech@cellecta.com if you do not have this link. Please note that you may need updated Library Configuration File(s) if you have older “Library Design Files” required with earlier versions of the software (shipped before 2022).

Using the Alignment Software

The program requires the following input files:

The Illumina FASTQ file (.fastq or .fastq.gz) with the raw NGS data.
A Sample Description File with data on which samples were sequenced and what indexes were used for each sample in tab-delimited TEXT (.txt or .tsv) or FASTA (.fa or .fna) format.
The appropriate Library Configuration File(s) in JSON format (.json), which contain(s) the list(s) of target sequences and deconvolution instructions for the Cellecta Library or other compatible library (e.g. Brunello, GeCKO).

The program outputs two files:

A table of aligned counts for each sample (in columns) and target (in rows).
A table of total aligned, unaligned, and ambiguous counts for each sample to assess the quality of the sequencing.

“Unaligned” sequences are those for which no sample or target could be identified.
“Ambiguous” sequences are those for which multiple samples or targets could be identified; they are not expected to occur with most libraries, as library sequences are chosen to be unambiguous within the Hamming distance used.

In addition to the files above, there is an option to have the program output an individual FASTQ file for each sample when it demultiplexes the FASTQ file for the sequencing run.

Key Steps for Use of the Software:

Obtain the FASTQ format files from the Illumina instrument. The file names should follow Illumina’s naming convention.
If the samples have not been demultiplexed already, fill out the “Sample Description Input Form” and save the Sample Description File as .txt.
Enter the locations of the FASTQ NGS Files, Sample Description File (if needed), and Library Configuration File(s) into the Program interface.

Setting up the Sample Description File

***Skip this section if your FASTQ files have been demultiplexed already.***

Open the template file NGS-Sample-Description-Input-Form.xlsm. You will enter your sample names into this template, and then save the file with a name descriptive of your experiment.

Note: Please make sure to Enable Macros in Excel, if prompted.

The Sample Description Input Form has 3 columns:

Column 1 lists the primer designation for each barcode.
Column 2 is reserved for the user to enter the sample descriptions.
Column 3 indicates which DNA index sequence will be used to demultiplex the samples.

Enter the sample descriptions in column 2. Fill in a name for each sample at the appropriate location based on the primer designation used to PCR amplify the specific sample, and the DNA index sequence associated with that sample.

Note: Do not skip a description for any sample. No sequencing data will be extracted for samples lacking descriptions. Avoid duplicate descriptions—duplications will be highlighted in red.

Save the Sample table in tab-delimited text format by either using the “Click to save as Sample Description File” button or manually saving the table in tab delimited (.txt) text format.

Running the Cellecta Alignment Software

Double-click the “Cellecta Alignment Software” link to open the program.

Select the samples. Click the button to navigate the file system and select the text-format Sample Description File corresponding to the experiment that was prepared above.

Select the library sequences for alignment with the Library Configuration File(s). Click the button to navigate the file system and select the appropriate Library Configuration File(s). Some program versions require more than one Library Configuration File (depending on the design of the library being sequenced).

Select the folder with the FASTQ NGS data files. Click the button to navigate the file system and select the folder containing the FASTQ files from the sequencer. The program will report how many distinct lanes it detected in that location. If the FASTQ files are compressed in GZIP format (have the “.gz” extension), it is not necessary to de-compress them.

Notes:

The program identifies FASTQ files by looking for file names following the Illumina Naming Convention, for example, Data\Intensities\BaseCalls\SampleName_S1_L001_R1_001.fastq.gz. Do not alter the Illumina-assigned file names.

Click the checkbox to save the individual demultiplexed FASTQ files. The program will run more slowly if the individual FASTQ files option is selected. Also, ensure that there is adequate disk space (i.e., greater than the size of the input FASTQ files) to output these files.

Begin the deconvolution/alignment run. Click the “RUN” button at the bottom of the tab, and choose the name and location of your output file in the file navigator window. If you are saving the demultiplexed files, they will be written to the same location, so please ensure that sufficient disk space is available.

After pressing return, a new window will pop up describing the run. When the run is completed the “DONE” button at the bottom will become active. This may take several hours.

Note: Additional runs can be started while waiting for the software to finish. To start an additional run, navigate to the main window and click any of the three buttons to select FASTQ files, samples, or library. The “RUN” button will re-activate and a new process can be initiated.

Note: For CRISPR sgRNA Library samples prepared with Cellecta’s NGS Prep Kit for sgRNA Libraries (Cat.# LNGS-120), Cellecta’s Alignment Software can also be used to demultiplex the Illumina data to make individual FASTQ files for each sample without additional analysis.

Other Options Instead of Cellecta’s Alignment Software

If you are not using Cellecta’s Alignment software, you will need to demultiplex the run to into separate FASTQ (*.bcl files) files. This can be done using the Illumina BaseSpace Hub or, alternatively, demultiplexing and fastq file generation may be done faster using a UNIX server and the Illumina bcl2fastq software. Using the bcl2fastq software also avoids the rather slow step of downloading FASTQ files from BaseSpace. You can obtain the bcl2fastq program at this link. Information on how to use this software is available at the following sites:

The Bioinformatics I/O Blog on the University of Glasgow’s Centre for Viral Research Site (for instance, see blog articles How to generate a Sample Sheet from sample/index data in BaseSpace and How to demultiplex Illumina data and generate fastq files using bcl2fastq).

The BaseSpace Sequence Hub Help Center from Illumina. In particular, see How to download data from Run.

Last modified: 17 May 2024

3.4. Sequencing sgRNA Inserts or Barcodes

4. Troubleshooting

Need more help with this?
Don’t hesitate to contact us here.