Request for Proposal/Quotation Questions/Answers Page 4

Total Page:16

File Type:pdf, Size:1020Kb

Request for Proposal/Quotation Questions/Answers Page 4

Request for Proposal / Quotation Questions/Answers For Enterprise Data, ETL and Analytics Platform (EDEAP)

March 27, 2017 Rhode Island Quality Institute

50 Holden Street Suite 300 Providence, RI 02908 Request for Proposal/Quotation Questions/Answers P a g e | 2 For Rhode Island Quality Institute

Questions and Answers

1. Do you want one hard copy and one electronic copy of the completed RFQ? a. One of either is fine. 2. Is a WinZip encrypted file attachment containing a Word or PDF format document acceptable? a. Yes. 3. Is cloud an option for the RDBMS or do you want an on-premise solution only? a. – Yes, cloud is an option 4. The current cache database is 15TB and expected to grow to 40TB in three years’ time. What is nightly throughput? a. Our answer assumes this question is related to the expected nightly throughput, via the ETL solution, to the proposed RDBMS. Expected data for the proposed RDBMS in this proposed environment is only about 1TB, with expected growth to 3-5 TB in around three years’ time. If the ETL solution is CDC capable, then the nightly throughput to the proposed RDBMS is minimal. If not, then the throughput will be whatever the current database contains. 5. What is the ideal update frequency for the proposed RDBMS? Intra-day? Nightly? Weekly? Etc. a. Ideally, nightly. However, for our use cases, weekly or monthly is acceptable too. 6. How much of the data is new vs changed? a. We are not sure what is meant by new vs changed data. If new data is “brand new clinical records” and changed is “old clinical record that had an update”, then approximately 98% is new. 7. How many people will need access to the ETL tool? How many people will need access to Statistical tools? a. ETL – probably two, maybe three people will need access. b. Statistical tool – Likely eight to 10 people. 8. How many people may be connected at any given time? How many people will actively be running statistical procedures at any given time? a. There could be eight to 10 people utilizing the statistical tools at any given time in the analytics environment, assuming there is a separate environment for the runtime/production environment. 9. What is the number of interactive reports versus batch/export files? a. 50/50 10. Is there an infrastructure that will be reused? a. Yes, please see the RFP. 11. Is there a remote or device focus? a. No. 12. Is the data relational data or unstructured? a. Most of the data is relational; however some of the fields contain unstructured text. 13. What kinds of analytics? a. All kinds, but mainly general descriptive statistics. 14. Can you please explain what you mean by ontologies? a. The fact that people use different terminology to mean the same thing presents a challenge – especially in the healthcare industry, where access to and understanding of data are critical to patient care. The fact that computers can’t understand these differences further complicates the problem. Are you able to translate discrete codified data into concepts and then into human language?

RFP/RFQ Template v.10.11 CONFIDENTIAL Request for Proposal/Quotation Questions/Answers P a g e | 3 For Rhode Island Quality Institute

15. What is the impact of “do nothing”? a. We don’t meet our client obligations. 16. Is there a desire to continue to build cubes? Is this done today due to performance considerations? a. No, there is no desire to build cubes. These are done in cache automatically to surface data for their interfaces. 17. What are the SLAs for analytic procedures/queries you want to run? a. Faster than they are now. Currently some simple count/group-by queries take on the order of four hours to complete. Our targets for these would have them complete in under five minutes. 18. Can you share with us some examples of queries today with associated run times? select count(distinct l.patient) from hsaa.labresultitem l left join hsaa.patient p on p.id=l.patient Where (cast(datediff(day,p.birthdate,'2016-03-31')/365.25 as int) between 18 and 75) and (l.TestItemCode_code='17856-6' OR l.TestItemCode_code='TST500' OR l.TestItemCode_code='HGBA1C' OR l.TestItemCode_code='50026400' OR l.TestItemCode_code='11271' OR l.TestItemCode_code='11717' OR l.TestItemCode_code='4548-4' OR l.TestItemCode_code='1008985' OR l.TestItemCode_code='HBA1C') and (%NOINDEX resulttime >= '2015-04-01 00:00:00' and %NOINDEX resulttime <='2016-03-31 23:59:59') and isnumeric(l.resultvalue)=1; a. The above query took 7hours and 15 minutes to run. b. Select count(*) from hsaa.hsaaorder; i. Took 361 seconds 19. Can you share with us the DDL of current databases? a. No, not now. 20. Are there any parts of your current solution that you intend to keep, must keep, or want to architect out of the target solution? a. We are keeping what we currently have. This new environment will sit alongside it. 21. Will a remote desktop or terminal be used? a. Yes, both. 22. How many records and fields are in your largest data set? a. Simply by the number of rows, our largest dataset contains 35 fields by ~120 million rows. There are other tables that have fewer rows but MORE fields. For example, 183 fields by 11.7 million rows and 114 fields by 44.9 million rows. 23. There are questions surrounding pushing out to reporting tools, is this to be considered a fourth aspect to this proposal? Referencing Product Features question 12/15 a. If your proposed analytical tools have reporting capabilities baked in, then please consider these questions. We are not actively looking for a new reporting tool. 24. How do you define reporting? Are you looking for a dashboard report or a rows/columns style output? a. Dashboards. 25. Do you have any open form text that you plan to analyze? What format is this in? a. Yes. This text is in various fields contained with the tables (e.g. “Admission Reason Description” )

RFP/RFQ Template v.10.11 CONFIDENTIAL Request for Proposal/Quotation Questions/Answers P a g e | 4 For Rhode Island Quality Institute

26. We are unclear on what is your experience on page 8 – do you need a client reference or is it compatible? a. Compatible 27. Referring to data questions numbers 7-9, are these questions directed at the database itself or for your statistical requirements? Or both? a. Database 28. One of your questions asks what statistical features are available; do you want a literal breakdown of the functions? a. No, a high-level list is acceptable. If you have links to the more complete list it would be helpful to include those. 29. Can you accept a word document with embedded references/spreadsheets? a. Yes 30. What is your vendor selection process? Do you need help with your use case justification? a. We have put together an evaluation matrix that considers the proposed solution. The results from this evaluation matrix will drive the chosen solution. We do not require assistance with our use case justification. 31. What kind of integration do they seek across the enterprise for ETL? a. Please see the RFQ. We are seeking to integrate most of our data sources into the proposed RDBMS, plus others. 32. What are the data sources? a. Please see the RFQ. They are listed in there. 33. Can you describe what the data is? a. The data is relational. The data is mostly clinical healthcare data from various settings and organizations. See RFQ for details on data sources. 34. Can you please provide examples of what you mean regarding data cleansing? a. E.g. Clinical encounter dates or birthdates that in the future or too far back in the past. b. The precision of time 35. What kinds of transformations are you looking to apply on your data? a. E.g. Sizing data fields more appropriately, recoding data fields to normalize them to specific values. 36. What are the key requirements for the ETL tool? a. We are looking for an ETL tool that can perform some profiling, transformations and cleansing of our data before writing it to the new RDBMS. Change data capture (CDC) and transfer rates would be key requirements. The following questions are all related to the Statistical Analysis Software: 37. On which processor architecture is the customer deploying the software? 38. On which vendor server is the customer deploying the software? 39. On which server brand is the customer deploying the software? 40. On which processor vendor is the customer deploying the software? 41. On which processor brand is the customer deploying the software? 42. On which processor type is the customer deploying the software? a. Our answer to all of these will be based on your recommended hardware requirements which are asked for in the RFQ.

RFP/RFQ Template v.10.11 CONFIDENTIAL

Recommended publications