I have a datatable and I would like to select first entry per day where all curveIDs are present. The only way I could think of to do this is with a join because then it will automatically join only where both datasets are present.
Here is what I have so far:
//core data from sql (I have little control over this) DataTable ds = new DataTable(); da.Fill(ds); //creating dataset with various tables based on curveIDs I look for System.Data.DataSet dataSet = new System.Data.DataSet(); for (int i = 0; i < curveIds.Length; i++) { dataSet.Tables.Add(ds.AsEnumerable().Where(x => x.Field<short>("curveID") == curveIds[i]).CopyToDataTable()); } //lets say I have two only and then I join them like this to match timestamps correctly var result = from table1 in dataSet.Tables[0].AsEnumerable() join table2 in dataSet.Tables[1].AsEnumerable() on table1["Timestamp"] equals table2["Timestamp"] select new { Timestamp = (DateTime)table1["Timestamp"], Spread = (double)table1["mid"] - 0.4 * (double)table2["mid"], Power = (double)table1["mid"] }; //lastly I do a firstordefault over the data as I only want the first timestamp where both are present (this step doesnt return the correct data) var endres = result.OrderBy(a => a.Timestamp).GroupBy(a => a.Timestamp.ToShortDateString()).FirstOrDefault().ToList(); This seems very convoluted. The last step also does not return one recordset per day early in the morning but rather many datasets for one day.
In the full problem I would have to do this for 4-6 curveIDs which means I would have to do a variable number of joins which kind of makes this approach unfeasible.
Source data has the columns (Timestamp, CurveID, Mid) for each minute between 8am and 4pm on workdays but there is no guarantee that all curveIDs are actually there on every timestamp.
Lets say on day 1 at 8:01 all IDs are there (first time stamp where this is true but not only) and on day two only at 8:03 are all IDs there then the return data should be:
Day1 8:01, spread =x, Power=y Day2 8:03, spread =z, Power=a ...
... and so forth, for every day only one entry chosen as the first where all IDs are there.
3 Answers
Answers 1
If I understood well, you want to find the lowest timestamp of every day (in the datatable you have) that have all the "curveIDs" of you curveID list?
If so, then I have wrote a code that might solve it. If there is a mistake, let me know in comments. Working with lists is easier to understand than sets o datatables. So I just used you "ds" datatable and built a indenpendt code.
Also there is optimizations to make, but that would let code a bit more hard to understand.
List<object> endress = new List<object>(); //filter all timestamps, getting only the date info var timeStamps = ds.AsEnumerable().Select(r=> ((DateTime)r["Timestamp"]).Date).Distinct(); //for each id foreach (var timeStamp in timeStamps) { //find all the same timestamp (on the same day) var listSameTimestamp = ds.AsEnumerable().Where(r => ((DateTime)r["Timestamp"]).Date == timeStamp); var listIds = listSameTimestamp.Select(r => (int)r["curveID"]).Distinct(); //ensure they all have the curveIDs you are looking for var haveThemAll = curveIds.Intersect(listIds).Count() == curveIds.Count(); if (haveThemAll == false) continue; //find the lowest timestamp var rowFound = listSameTimestamp.OrderBy(r => (DateTime)r["Timestamp"]).FirstOrDefault(); if (rowFound == null) continue; //create an anonymous object (coud not understand your needs) endress.Add(new { Timestamp = (DateTime)rowFound["Timestamp"], Spread = (double)rowFound["mid"] - 0.4 * (double)rowFound["mid"], Power = (double)rowFound["mid"] }); } That is the "main" fragment. But you can see a whole test code here:
DataTable ds = new DataTable(); List<int> curveIds = new List<int>() {1,2,3,4}; public void Test() { LoadDs(); List<object> endress = new List<object>(); //filter all timestamps, getting only the date info var timeStamps = ds.AsEnumerable().Select(r=> ((DateTime)r["Timestamp"]).Date).Distinct(); //for each id foreach (var timeStamp in timeStamps) { //find all the same timestamp (on the same day) var listSameTimestamp = ds.AsEnumerable().Where(r => ((DateTime)r["Timestamp"]).Date == timeStamp); var listIds = listSameTimestamp.Select(r => (int)r["curveID"]).Distinct(); //ensure they all have the curveIDs you are looking for var haveThemAll = curveIds.Intersect(listIds).Count() == curveIds.Count(); if (haveThemAll == false) continue; //find the lowest timestamp var rowFound = listSameTimestamp.OrderBy(r => (DateTime)r["Timestamp"]).FirstOrDefault(); if (rowFound == null) continue; //create an anonymous object (coud not understand your needs) endress.Add(new { Timestamp = (DateTime)rowFound["Timestamp"], Spread = (double)rowFound["mid"] - 0.4 * (double)rowFound["mid"], Power = (double)rowFound["mid"] }); } foreach (var o in endress) { Console.WriteLine(o); } } public void LoadDs() { ds = new DataTable(); ds.Columns.Add("curveID",typeof(int)); ds.Columns.Add("Timestamp", typeof(DateTime)); ds.Columns.Add("mid", typeof(double)); for (int i = 0; i < 50000; i++) { Random rand = new Random(i); var row = ds.NewRow(); row["curveID"] = rand.Next(1,5); row["Timestamp"] = new DateTime(2016,4, rand.Next(1,5), rand.Next(1,3), 0,0); row["mid"] = rand.NextDouble(); ds.Rows.Add(row); } } Answers 2
If I understand correctly:
1. You have a table with timestamp, curveid, mid columns
2. timestamps are (at least typically) every minute, not all curveids are guaranteed present
3. You wish to calculate a spread, power using the rows for the first timestamp where all requisite curveids are present
I suggest something like this:
// I'll pretend the curveids are in this list... List<double> curveids = new List<double>(); DataTable table = ds.Tables["Your table"]; // first get a grouping of timestamps for the day containing all curveids // setup mindate and maxdate of your choosing... var grouping = table.AsEnumerable() .Where(x => curveids.Contains(x.curveid) && x.timestamp > mindate && x.timestamp < maxdate) .GroupBy(x => x.timestamp); // this gives a grouping of IEnumerable<IGrouping<DateTime, YourRowType>> // i.e. timestamps, and group of rows for each with curveids in your selection // Now get the minimum timestamp, where all curve ids are present.. DateTime minTimestamp = grouping.Where(x => x.Count(y => y.curveid) == curveids.Count) .Select(x => x.Key).Min(); // .. now can do what you wish with that... // For example: var resultRows = table.AsEnumerable().Where(x => x.timestamp == minTimestamp && curveids.Contains(x.Close)); Now you can use the resultRows and calculate the spread, power, etc or whatever depending upon the formula
Answers 3
Here is my take on it:
//selecting into an object for better readability and access var result = dt.AsEnumerable().Select(r => new { TimeStamp = r.Field<DateTime>("TimeStamp"), CurveID = r.Field<short>("CurveId"), Mid = r.Field<double>("Mid") }) // ignoring rows with different curve ID than in the list .Where(item => ids.Contains(item.CurveID)) // grouping by timestamp .GroupBy(item => item.TimeStamp) // selecting only groups that have all curve Ids .Where(g => g.Select(i=>i.CurveID).Distinct().Count() == ids.Count) // grouping the groups by date .GroupBy(g => g.Key.Date) .Select(g2 => { // getting the first timestamp group by timestamp var min = g2.OrderBy(i => i.Key).First(); // getting all the Mid values var values = min.Select(i => i.Mid).ToList(); // returning the desired computation return new { TimeStamp = min.Key, Spread = spread(values), Power = power(values) }; }) .ToList(); My assumptions from the text of the question and the existing comments are:
- timestamps must be exactly equal for all curve ids for the record to be taken into account
- records with curve ids that are not in the list are ignored
- only the mid values and the smallest timestamp where all curve ids meet each day matter to the end result
I have to add that this is not the most efficient method, as there are several passes that go through the data: first to group by timestamp and to filter by ids, then to filter by curveids, then to group by date and finally to first the first timestamp in the day. A faster but less readable implementation would order first, then only pass once through each item.
0 comments:
Post a Comment