LS-Join: Local Similarity Join on String Collections

Total Page:16

File Type:pdf, Size:1020Kb

LS-Join: Local Similarity Join on String Collections LS-Join: Local Similarity Join on String Collections Jiaying Wang, Xiaochun Yang,Member,IEEE, Bin Wang, and Chengfei Liu,Member,IEEE Abstract—String similarity join, as an essential operation in applications including data integration and data cleaning, has attracted significant attention in the research community. Previous studies focus on global similarity join. In this paper, we study local similarity join with edit distance constraints, which finds string pairs from two string collections that have similar substrings. We study two kinds of local similarity join problems: checking local similar pairs and locating local similar pairs. We first consider the case where if two strings are locally similar to each other, they must share a common gram of a certain length. We show how to do efficient local similarity verification based on a matching gram pair. We propose two pruning techniques and an incremental method to further improve the efficiency of finding matching gram pairs. Then we devise a method to locate the longest similar substring pair for two local similar strings. We conducted a comprehensive experimental study to evaluate the efficiency of these techniques. Index Terms—Local Similarity Join, Edit Distance, Similar Substrings, Filtering. ! 1INTRODUCTION ID Strings r1 Samsung DV150F 16.2MP Smart Camera HE problem of similarity join, which is to find similar r2 Canon EOS ELAN 7E(35mm) SLR Camera string pairs from two string collections, is relevant to ::::::::::::::::::: T r3 Canon PowerShot SX170 IS MP Digital Camera many data cleaning and data integration applications [11], r4 Sony W800/B 20 MP Digital Camera [24]. Various functions can be used to quantify the similarity (a) Purchase of goods. between two strings, such as edit distance and Jaccard. ID Strings Many approaches [2], [15], [17], [19], [22], [28], [31] are s1 New Samsung’s DV150FX camera $85.00 developed to solve this problem. s2 Best for Beginners: Canon:::::::::EOS::::::ELAN:::::7/7E $449.99 Existing studies focus on global similarity join. However, s3 Memory Card for Canon:::::::::EOS::::::ELAN:::::7/7E $15.99 in applications such as data integration [13], and bioinfor- s4 New Canon’s PowerShot SX170IS Camera $99.95 matics [1], it is often important to find similar substring pairs, (b) Supply of goods. even if two strings are not similar globally. The following are Fig. 1: Two product tables. two motivating examples. Example 1. In data integration, users often want to match that are globally dissimilar but share similar substrings. In s3 s5 the same entity from different sources. Fig. 1 shows an particular, the underlined substrings of and are similar. example of an online shopping mall’s purchase list and a supply list from its supplier. Record r1 in Fig. 1(a) and ID Strings s1 record s1 in Fig. 1(b) describe the same Samsung camera DCCADGGCRAARDCRCDD model. In particular, they have two substrings that have s2 AGACAGCRRAARCDRAGG slightly different representations. Finding this type of pairs s3 GCAGTACTCAACGATAGC s4 can help us locate records related to the same product, so ::::::::::::::::::::::::GGATTACCTAGGCATTCT that we can do a deeper analysis to remove duplicates and s5 ATCATGCACTACTGAACG s6 integrate information from different sources. ::::::::::::::::::::::::GGATTACCTAAGCATTCT Example 2. A fundamental problem in protein sequence Fig. 2: Bio-sequences. comparison is to decide whether two sequences share com- In this paper, we study the problem of local similarity mon structural and functional features based on similarity join, which finds pairs from two string collections such that observed in their amino acid sequences. The decision can they share similar substrings. To evaluate local similarity, help scientists detect biologically similar living organisms we follow the way in [3] to evaluate local similarity by using in a large genome bank. Fig. 2 shows two bio-sequences length and edit distance constraints, since edit distance has been widely used for evaluating string similarity. In [3], the • Jiaying Wang, Xiaochun Yang and Bin Wang are with School of local similarity matching problem is defined as matching Computer Science and Engineering, Northeastern University, Liaoning any l-length pattern with k errors. It finds all locations in 110819, China. Xiaochun Yang is the corresponding author. E-mail: [email protected];{yangxc,bwang}@mail.neu.edu.cn. the text where an l-length substring of P ocucrs, with at • Chengfei Liu is with Department of Computer Science and Software most k differences. In real applications, l can be set to be the Engineering, Swinburne University of Technology, Australia. minimal entity length or phrase length, and k should satisfy E-mail: [email protected]. k l. Different from the similarity matching problem in Digital Object Identifier no. 10.1109/TKDE.2017.2687460 1041-4347 ß 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. 2 [3], we focus on local similarity join problem. To the best of We say α, β is a longest local similar pair of r and s if our knowledge, this is the first study of the local similarity min(|α|, |β|) is maximum among all the similar substring join problem under edit distance constraint. It contains two pairs of r and s. sub-problems: checking local similar pairs and locating local Consider the example in Fig. 1. Let l =10and τ =2. similar pairs. We develop techniques to solve the problem Then r1,s1 is a local similar pair under τ w.r.t. l, since sub- efficiently. We make the following contributions: string r1[1, 14] = Samsung_DV150F and substring s1[5, 20] 2 • We develop a local similarity join framework for = Samsung’s_DV150F have an edit distance and their 10 [1 14] [5 20] string collections in Section 3. It consists of three lengths are not less than . r1 , , s1 , is also their steps: (i) finding and pruning matching gram pairs longest similar substring pair. to generate candidate gram pairs, (ii) verifying candi- Below we define two sub-problems. dates by extending candidate gram pairs to substring Checking local similar pairs (also called LS-JOIN CHECK- pairs and calculating edit distance of substring pairs, ING problem or the “LJC problem”): Given two string and (iii) updating index. The framework can support collections R and S, let l be a length threshold and τ be self-join as well as join between two collections of an edit distance threshold. The problem of checking local ∈ × strings. similar pairs is to find all similar string pairs r, s R S, ( ) ≤ C • We first focus on the sub-problem of “checking local such that edl r, s τ. We use “Rl,τ S” to represent the similar pairs”. Local similarity verification based on a operation of checking local similar pairs. matching gram pair is not trivial. Naively it needs to Example 3. For the two string sets R and S in Fig. 1, we enumerate all the substring pairs. Existing extension- C have Rl=10,τ=2 S ={r1,s1 , r2,s2 , r2,s3 , r3,s4 }. based method, which conducts exact extension in Similarly, for the string set S shown in Fig. 2, we have one string and similar extension in another string, C Sl=10,τ=2 S ={s3,s5 , s4,s6 }. cannot solve the problem. How to do efficient veri- fication without enumerating all the substring paris? Locating local similar pairs (also called LS-JOIN LOCATING We propose techniques in Section 4 to tackle this problem or the “LJL problem”): For each local similar string ∈ × ( ) ≤ question. pair r, s R S with edl r, s τ, locate the longest local 1 L • Furthermore, there are many matching grams, some similar pair α, β of r, s under τ w.r.t. l. We use “Rl,τ of which could not extend to the final results, where- S” to represent the operation of locating local similar pairs. as others could cause duplicate verifications. Existing Example 4. Consider the two string sets R and S in Fig. 1. pruning methods do not work for local similarity L = [1 14] [5 20] [1 17] We have Rl=10,τ=2 S { r1 , , s1 , , r2 , ,s2 problem, thus in Section 5 we propose two new [21, 39], r2[1, 17], s3[17, 35], r3[1, 21], s4[5, 27]}. Simi- orthogonal pruning techniques to reduce candidates, L = larly, for the string set S in Fig. 2, Sl=10,τ=2 S and an incremental method to boost the process. {s3[1, 13],s5[6, 18], s4[1, 18],s6[1, 18]}. The underlined • We extend the techniques of “checking local similar substrings in Figs. 1 and 2 are answers of the examples. pairs” to the sub-problem of “locating local similar pairs” in Section 6. We show that our techniques are 2.2 Edit Distance Matrix general enough to cover these two sub-problems. Edit distance can be computed using matrix-filling dynamic • We conduct extensive experiments on real and syn- programming algorithm[23], which reserves a matrix to thetic datasets to show the efficiency of the proposed hold the edit distances between all the prefixes of two techniques in Section 7. strings. We use i, j to denote a cell at the i-th row and j-th column in an edit distance matrix, and use D(i, j) to denote 2PRELIMINARIES the cell value. The edit distance matrix is constructed based on the recurrence relation given in Equation 1, in which 2.1 Problem Description cij =1if s1[i] = s2[j]; otherwise cij =0. Initially, D(i, 0) = (0 )= Let Σ be an alphabet of characters. For a string s consisting i and D ,j j. For two strings s1 and s2, we have ed(s1[1,i],s2[1,j]) = D(i, j) ed(s1,s2)=D(|s1|, |s2|) of characters in Σ,weuse|s| to denote its length, s[i] to ⎧ ,so .
Recommended publications
  • Unix: Beyond the Basics
    Unix: Beyond the Basics BaRC Hot Topics – September, 2018 Bioinformatics and Research Computing Whitehead Institute http://barc.wi.mit.edu/hot_topics/ Logging in to our Unix server • Our main server is called tak4 • Request a tak4 account: http://iona.wi.mit.edu/bio/software/unix/bioinfoaccount.php • Logging in from Windows Ø PuTTY for ssh Ø Xming for graphical display [optional] • Logging in from Mac ØAccess the Terminal: Go è Utilities è Terminal ØXQuartz needed for X-windows for newer OS X. 2 Log in using secure shell ssh –Y user@tak4 PuTTY on Windows Terminal on Macs Command prompt user@tak4 ~$ 3 Hot Topics website: http://barc.wi.mit.edu/education/hot_topics/ • Create a directory for the exercises and use it as your working directory $ cd /nfs/BaRC_training $ mkdir john_doe $ cd john_doe • Copy all files into your working directory $ cp -r /nfs/BaRC_training/UnixII/* . • You should have the files below in your working directory: – foo.txt, sample1.txt, exercise.txt, datasets folder – You can check they’re there with the ‘ls’ command 4 Unix Review: Commands Ø command [arg1 arg2 … ] [input1 input2 … ] $ sort -k2,3nr foo.tab -n or -g: -n is recommended, except for scientific notation or start end a leading '+' -r: reverse order $ cut -f1,5 foo.tab $ cut -f1-5 foo.tab -f: select only these fields -f1,5: select 1st and 5th fields -f1-5: select 1st, 2nd, 3rd, 4th, and 5th fields $ wc -l foo.txt How many lines are in this file? 5 Unix Review: Common Mistakes • Case sensitive cd /nfs/Barc_Public vs cd /nfs/BaRC_Public -bash: cd: /nfs/Barc_Public:
    [Show full text]
  • Useful Commands in Linux and Other Tools for Quality Control
    Useful commands in Linux and other tools for quality control Ignacio Aguilar INIA Uruguay 05-2018 Unix Basic Commands pwd show working directory ls list files in working directory ll as before but with more information mkdir d make a directory d cd d change to directory d Copy and moving commands To copy file cp /home/user/is . To copy file directory cp –r /home/folder . to move file aa into bb in folder test mv aa ./test/bb To delete rm yy delete the file yy rm –r xx delete the folder xx Redirections & pipe Redirection useful to read/write from file !! aa < bb program aa reads from file bb blupf90 < in aa > bb program aa write in file bb blupf90 < in > log Redirections & pipe “|” similar to redirection but instead to write to a file, passes content as input to other command tee copy standard input to standard output and save in a file echo copy stream to standard output Example: program blupf90 reads name of parameter file and writes output in terminal and in file log echo par.b90 | blupf90 | tee blup.log Other popular commands head file print first 10 lines list file page-by-page tail file print last 10 lines less file list file line-by-line or page-by-page wc –l file count lines grep text file find lines that contains text cat file1 fiel2 concatenate files sort sort file cut cuts specific columns join join lines of two files on specific columns paste paste lines of two file expand replace TAB with spaces uniq retain unique lines on a sorted file head / tail $ head pedigree.txt 1 0 0 2 0 0 3 0 0 4 0 0 5 0 0 6 0 0 7 0 0 8 0 0 9 0 0 10
    [Show full text]
  • Frequently Asked Questions Welcome Aboard! MV Is Excited to Have You
    Frequently Asked Questions Welcome Aboard! MV is excited to have you join our team. As we move forward with the King County Access transition, MV is committed to providing you with up- to-date information to ensure you are informed of employee transition requirements, critical dates, and next steps. Following are answers to frequency asked questions. If at any time you have additional questions, please refer to www.mvtransit.com/KingCounty for the most current contacts. Applying How soon can I apply? MV began accepting applications on June 3rd, 2019. You are welcome to apply online at www.MVTransit.com/Careers. To retain your current pay and seniority, we must receive your application before August 31, 2019. After that date we will begin to process external candidates and we want to ensure we give current team members the first opportunity at open roles. What if I have applied online, but I have not heard back from MV? If you have already applied, please contact the MV HR Manager, Samantha Walsh at 425-231-7751. Where/how can I apply? We will process applications out of our temporary office located at 600 SW 39th Street, Suite 100 A, Renton, WA 98057. You can apply during your lunch break or after work. You can make an appointment if you are not be able to do it during those times and we will do our best to meet your schedule. Please call Samantha Walsh at 425-231-7751. Please bring your driver’s license, DOT card (for drivers) and most current pay stub.
    [Show full text]
  • Niagara Networking and Connectivity Guide
    Technical Publications Niagara Networking & Connectivity Guide Tridium, Inc. 3951 Westerre Parkway • Suite 350 Richmond, Virginia 23233 USA http://www.tridium.com Phone 804.747.4771 • Fax 804.747.5204 Copyright Notice: The software described herein is furnished under a license agreement and may be used only in accordance with the terms of the agreement. © 2002 Tridium, Inc. All rights reserved. This document may not, in whole or in part, be copied, photocopied, reproduced, translated, or reduced to any electronic medium or machine-readable form without prior written consent from Tridium, Inc., 3951 Westerre Parkway, Suite 350, Richmond, Virginia 23233. The confidential information contained in this document is provided solely for use by Tridium employees, licensees, and system owners. It is not to be released to, or reproduced for, anyone else; neither is it to be used for reproduction of this control system or any of its components. All rights to revise designs described herein are reserved. While every effort has been made to assure the accuracy of this document, Tridium shall not be held responsible for damages, including consequential damages, arising from the application of the information given herein. The information in this document is subject to change without notice. The release described in this document may be protected by one of more U.S. patents, foreign patents, or pending applications. Trademark Notices: Metasys is a registered trademark, and Companion, Facilitator, and HVAC PRO are trademarks of Johnson Controls Inc. Black Box is a registered trademark of the Black Box Corporation. Microsoft and Windows are registered trademarks, and Windows 95, Windows NT, Windows 2000, and Internet Explorer are trademarks of Microsoft Corporation.
    [Show full text]
  • Sort, Uniq, Comm, Join Commands
    The 19th International Conference on Web Engineering (ICWE-2019) June 11 - 14, 2019 - Daejeon, Korea Powerful Unix-Tools - sort & uniq & comm & join Andreas Schmidt Department of Informatics and Institute for Automation and Applied Informatics Business Information Systems Karlsruhe Institute of Technologie University of Applied Sciences Karlsruhe Germany Germany Andreas Schmidt ICWE - 2019 1/10 sort • Sort lines of text files • Write sorted concatenation of all FILE(s) to standard output. • With no FILE, or when FILE is -, read standard input. • sorting alpabetic, numeric, ascending, descending, case (in)sensitive • column(s)/bytes to be sorted can be specified • Random sort option (-R) • Remove of identical lines (-u) • Examples: • sort file city.csv starting with the second column (field delimiter: ,) sort -k2 -t',' city.csv • merge content of file1.txt and file2.txt and sort the result sort file1.txt file2.txt Andreas Schmidt ICWE - 2019 2/10 sort - examples • sort file by country code, and as a second criteria population (numeric, descending) sort -t, -k2,2 -k4,4nr city.csv numeric (-n), descending (-r) field separator: , second sort criteria from column 4 to column 4 first sort criteria from column 2 to column 2 Andreas Schmidt ICWE - 2019 3/10 sort - examples • Sort by the second and third character of the first column sort -t, -k1.2,1.2 city.csv • Generate a line of unique random numbers between 1 and 10 seq 1 10| sort -R | tr '\n' ' ' • Lottery-forecast (6 from 49) - defective from time to time ;-) seq 1 49 | sort -R | head -n6
    [Show full text]
  • Unix Commands January 2003 Unix
    Unix Commands Unix January 2003 This quick reference lists commands, including a syntax diagram 2. Commands and brief description. […] indicates an optional part of the 2.1. Command-line Special Characters command. For more detail, use: Quotes and Escape man command Join Words "…" Use man tcsh for the command language. Suppress Filename, Variable Substitution '…' Escape Character \ 1. Files Separation, Continuation 1.1. Filename Substitution Command Separation ; Wild Cards ? * Command-Line Continuation (at end of line) \ Character Class (c is any single character) [c…] 2.2. I/O Redirection and Pipes Range [c-c] Home Directory ~ Standard Output > Home Directory of Another User ~user (overwrite if exists) >! List Files in Current Directory ls [-l] Appending to Standard Output >> List Hidden Files ls -[l]a Standard Input < Standard Error and Output >& 1.2. File Manipulation Standard Error Separately Display File Contents cat filename ( command > output ) >& errorfile Copy cp source destination Pipes/ Pipelines command | filter [ | filter] Move (Rename) mv oldname newname Filters Remove (Delete) rm filename Word/Line Count wc [-l] Create or Modify file pico filename Last n Lines tail [-n] Sort lines sort [-n] 1.3. File Properties Multicolumn Output pr -t Seeing Permissions filename ls -l List Spelling Errors ispell Changing Permissions chmod nnn filename chmod c=p…[,c=p…] filename 2.3. Searching with grep n, a digit from 0 to 7, sets the access level for the user grep Command grep "pattern" filename (owner), group, and others (public), respectively. c is one of: command | grep "pattern" u–user; g–group, o–others, or a–all. p is one of: r–read Search Patterns access, w–write access, or x–execute access.
    [Show full text]
  • Gnu Coreutils Core GNU Utilities for Version 5.93, 2 November 2005
    gnu Coreutils Core GNU utilities for version 5.93, 2 November 2005 David MacKenzie et al. This manual documents version 5.93 of the gnu core utilities, including the standard pro- grams for text and file manipulation. Copyright c 1994, 1995, 1996, 2000, 2001, 2002, 2003, 2004, 2005 Free Software Foundation, Inc. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.1 or any later version published by the Free Software Foundation; with no Invariant Sections, with no Front-Cover Texts, and with no Back-Cover Texts. A copy of the license is included in the section entitled “GNU Free Documentation License”. Chapter 1: Introduction 1 1 Introduction This manual is a work in progress: many sections make no attempt to explain basic concepts in a way suitable for novices. Thus, if you are interested, please get involved in improving this manual. The entire gnu community will benefit. The gnu utilities documented here are mostly compatible with the POSIX standard. Please report bugs to [email protected]. Remember to include the version number, machine architecture, input files, and any other information needed to reproduce the bug: your input, what you expected, what you got, and why it is wrong. Diffs are welcome, but please include a description of the problem as well, since this is sometimes difficult to infer. See section “Bugs” in Using and Porting GNU CC. This manual was originally derived from the Unix man pages in the distributions, which were written by David MacKenzie and updated by Jim Meyering.
    [Show full text]
  • Startschedulewebexmeeting.Pdf
    FLORIDA ATLANTIC UNIVERSITY INSTRUCTIONAL TECHNOLOGIES Resource Library STARTING AND SCHEDULING A WEBEX MEETING If you're looking to conduct live virtual sessions with your colleagues or students, then Webex is the tool for you. Accessible via https://fau.webex.com, Webex is a great tool for collaboration, training, and conducting presentations. With the Webex Meetings app, you can schedule and host meetings from any Windows, Mac, iOS, or Android device. This document shows you how to start, join and schedule standard Webex Meetings in three different ways. Notes on recordings are provided at the end. BEFORE WE START… Canvas Users: If you plan on recording these meetings, note that they will not be linked in your Canvas course(s) automatically. Follow the steps in the final section of this document to learn how to access and share these recordings. iPad & Chromebook users: It is necessary to obtain the session number and password before joining. The document linked here is a guide on how to obtain this information. In order to avoid delays, we highly recommend using a Windows/Mac laptop or desktop, iPhone, or Android phone to start/join sessions. Make sure you have a functional webcam and/or microphone connected or built-in to your computer. Download and install the Webex Meetings app on your device(s): https://webex.com/downloads If you have trouble with any of the following instructions, please submit this Help Desk Request. START A PERSONAL ROOM FROM THE APP Webex Personal Rooms are a quick and easy way to meet virtually. The room is specific to your account and does not allow others to join until you start it.
    [Show full text]
  • Gnu Coreutils Core GNU Utilities for Version 6.9, 22 March 2007
    gnu Coreutils Core GNU utilities for version 6.9, 22 March 2007 David MacKenzie et al. This manual documents version 6.9 of the gnu core utilities, including the standard pro- grams for text and file manipulation. Copyright c 1994, 1995, 1996, 2000, 2001, 2002, 2003, 2004, 2005, 2006 Free Software Foundation, Inc. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, with no Front-Cover Texts, and with no Back-Cover Texts. A copy of the license is included in the section entitled \GNU Free Documentation License". Chapter 1: Introduction 1 1 Introduction This manual is a work in progress: many sections make no attempt to explain basic concepts in a way suitable for novices. Thus, if you are interested, please get involved in improving this manual. The entire gnu community will benefit. The gnu utilities documented here are mostly compatible with the POSIX standard. Please report bugs to [email protected]. Remember to include the version number, machine architecture, input files, and any other information needed to reproduce the bug: your input, what you expected, what you got, and why it is wrong. Diffs are welcome, but please include a description of the problem as well, since this is sometimes difficult to infer. See section \Bugs" in Using and Porting GNU CC. This manual was originally derived from the Unix man pages in the distributions, which were written by David MacKenzie and updated by Jim Meyering.
    [Show full text]
  • Junior Developer - Uniq-ID®
    www.escholar.com Junior Developer - Uniq-ID® We are looking for a Junior Developer to join our development team. This full-time position will be responsible for development, testing, and documentation for the eScholar Uniq-ID® product. The Junior Developer must be self-motivated, willing to learn new technologies and must have good written and oral communication skills. Primary Responsibilities: • Develop, test, and maintain web and mobile applications • Design, develop, and unit test applications in accordance with established standards • Participate in peer-reviews of solution designs and related code • Analyze and resolving technical and application issues and defects • Serve as Level 2 support and handling support activities • Adhere to high-quality development principles • Actively learning new technologies • Perform various adhoc responsibilities • Participate daily stand-ups, backlog reviews, sprint planning meetings and other agile meetings Required Skill and Qualifications: • Experience with SQL (MSSQL, Oracle, PostgreSQL) • Experience with backend development with Java (11), Junit, Maven • Experience with frontend development using JavaScript frameworks (Vue), HTML5/CSS/SCSS/SASS, JSON API integration • Experience with Git, BitBucket, Jira, Jenkins • Full SDLC experience • Experience building large scale transactional systems • Exemplary written and verbal communication skills Desired Skills and Qualifications: • Familiarity with the public and/or higher education sector (highly desirable). For consideration, please apply on the careers page of our website http://www.escholar.com/work-at-escholar/ Due to the high volume of applications we receive, we are only able to contact those candidates whose qualifications most closely match the position requirements. To qualify, applicant must be a U.S. citizen, permanent resident alien (“green card holder”), temporary resident alien, refugee or asylee.
    [Show full text]
  • 22Q11.2 Duplications North Carolina, 27526, Ontario, USA Canada on P4N 6N5 Tel +1 (919) 567-8167 [email protected] Or [email protected]
    Support and Information Rare Chromosome Disorder Support Group The Stables, Station Road West, Oxted, Surrey RH8 9EE, United Kingdom Tel: +44(0)1883 723356 [email protected] I www.rarechromo.org Join Unique for family links, information and support. Unique is a charity without government funding, existing entirely on donations and grants. If you can, please make a donation via our website at www.rarechromo.org/donate Please help us to help you! Chromosome 22 Central www.c22c.org or c/o Murney Rinholm, c/o Stephanie St-Pierre, 7108 Partinwood Drive, 338 Spruce Street North, Fuquay-Varina, Timmins, 22q11.2 duplications North Carolina, 27526, Ontario, USA Canada ON P4N 6N5 Tel +1 (919) 567-8167 [email protected] or [email protected] Facebook www.facebook.com/groups/214854295210303 This guide was developed by Unique with generous support from the James Tudor Foundation. Unique lists external message boards and websites in order to be helpful to families looking for information and support. This does not imply that we endorse their content or have any responsibility for it. This information guide is not a substitute for personal medical advice. Families should consult a medically qualified clinician in all matters relating to genetic diagnosis, management and health. Information on genetic changes is a very fast-moving field and while the information in this guide is believed to be the best available at the time of publication, some facts may later change. Unique does its best to keep abreast of changing information and to review its published guides as needed. The guide was compiled by Unique and reviewed by Dr Melissa Carter, Clinical Geneticist specializing in developmental disabilities at The Hospital for Sick Children in Toronto, Canada , and by Professor Maj Hultén, Professor of Medical Genetics, University of Warwick, UK and Karolinska Institutet, Stockholm, Sweden.
    [Show full text]
  • GPL-3-Free Replacements of Coreutils 1 Contents
    GPL-3-free replacements of coreutils 1 Contents 2 Coreutils GPLv2 2 3 Alternatives 3 4 uutils-coreutils ............................... 3 5 BSDutils ................................... 4 6 Busybox ................................... 5 7 Nbase .................................... 5 8 FreeBSD ................................... 6 9 Sbase and Ubase .............................. 6 10 Heirloom .................................. 7 11 Replacement: uutils-coreutils 7 12 Testing 9 13 Initial test and results 9 14 Migration 10 15 Due to the nature of Apertis and its target markets there are licensing terms that 1 16 are problematic and that forces the project to look for alternatives packages. 17 The coreutils package is good example of this situation as its license changed 18 to GPLv3 and as result Apertis cannot provide it in the target repositories and 19 images. The current solution of shipping an old version which precedes the 20 license change is not tenable in the long term, as there are no upgrades with 21 bugfixes or new features for such important package. 22 This situation leads to the search for a drop-in replacement of coreutils, which 23 need to provide compatibility with the standard GNU coreutils packages. The 24 reason behind is that many other packages rely on the tools it provides, and 25 failing to do that would lead to hard to debug failures and many custom patches 26 spread all over the archive. In this regard the strict requirement is to support 27 the features needed to boot a target image with ideally no changes in other 28 components. The features currently available in our coreutils-gplv2 fork are a 29 good approximation. 30 Besides these specific requirements, the are general ones common to any Open 31 Source Project, such as maturity and reliability.
    [Show full text]