Filter operator does not update statistics

March 19, 2018, 8:05 am

≫ Next: Problem with generalized linear model (lambda seach)

≪ Previous: Exporting clustering analysis data

Hi,

if I filter the dataset then statistics for polynomial column of the dataset are not updated. The result is that if I plot the chart and axis is based on this column then axis is not auto ranged i.e. data points are only on the left side. It works with some sets and with some dont. The workaround is to filter everything, store it in excel and then load it again. But you can imagine that it is not ideal. In the picture you can see the statistics

<?xml version="1.0" encoding="UTF-8"?><process version="8.1.001">
<operator activated="true" class="retrieve" compatibility="8.1.001" expanded="true" height="68" name="Retrieve AWAD8400_AWAD9700" width="90" x="45" y="544">
<parameter key="repository_entry" value="../../data/Full dataset/AWAD8400_AWAD9700"/>
</operator>
</process>
<?xml version="1.0" encoding="UTF-8"?><process version="8.1.001">
<operator activated="true" class="filter_examples" compatibility="8.1.001" expanded="true" height="103" name="Filter Examples" width="90" x="179" y="544">
<parameter key="parameter_expression" value=""/>
<parameter key="condition_class" value="custom_filters"/>
<parameter key="invert_filter" value="false"/>
<list key="filters_list">
<parameter key="filters_entry_key" value="Application version Minor.eq.6\.0"/>
<parameter key="filters_entry_key" value="Hour - start.ge.6"/>
<parameter key="filters_entry_key" value="Hour - start.lt.14"/>
<parameter key="filters_entry_key" value="Profile.is_in.ProfileA;ProfileB"/>
<parameter key="filters_entry_key" value="SetupId.is_in.INV0422\.014;INV0422\.015"/>
</list>
<parameter key="filters_logic_and" value="true"/>
<parameter key="filters_check_metadata" value="true"/>
</operator>
</process>

↧

Problem with generalized linear model (lambda seach)

March 19, 2018, 6:44 pm

≫ Next: SPADE algorithm for sequential pattern mining, does RapidMiner has ?

≪ Previous: Filter operator does not update statistics

Hi all,

I'm trying to do classification using generalized linear model.

In default setting, the lambda value is chosen by H2O (described in documentation).

However, I found that if I use lambda search, the performance is much better.

I don't understand what is the difference between this two method.

Is the better performance from doing lambda search comes from overfitting?

Thanks!

Best,

Scott

↧

SPADE algorithm for sequential pattern mining, does RapidMiner has ?

March 20, 2018, 11:56 pm

≫ Next: Neural Network With Multiple Outputs for Prediction

≪ Previous: Problem with generalized linear model (lambda seach)

Hi RapidMiner,

SPADE (Sequential Pattern Discovery using Equivalence classes) is another algorithm for sequential pattern mining besides GSP, and CM-SPADE is an improved version of SPADE by using co-occurence information. May i know whether RapidMiner has either of SPADE or CM-SPADE ? If not, will you plan to implement these 2, cos i've seen some articles said SPADE is faster than GSP for large datasets?

Thank you very much for this information.

P/S: below is the links to the papers of SPADE and CM-SPADE:

https://link.springer.com/content/pdf/10.1023/A:1007652502315.pdf

https://www.philippe-fournier-viger.com/spmf/PAKDD2014_sequential_pattern_mining_CM-SPADE_CM-SPAM.pdf

Best Regards,

Phi Vu

↧

Neural Network With Multiple Outputs for Prediction

March 21, 2018, 12:26 am

≫ Next: Generate a Random Number (New Number With Each Instance)

≪ Previous: SPADE algorithm for sequential pattern mining, does RapidMiner has ?

Hi,

I want a neural network with two outputs for prediction. I usually use the "set role" operator to set a lable, so the nerual network has one output. Now, how to set to get two outputs?

Thank you!

↧

Generate a Random Number (New Number With Each Instance)

March 21, 2018, 8:05 am

≫ Next: Looping through an example to generate more examples

≪ Previous: Neural Network With Multiple Outputs for Prediction

Hi there

I'm working on a process that creates two random numbers. The following ways have been tested already:

I used the "Generate Data" operator and tried to use both the attributes as well the label
I used the "Generate Attributes" rand()

I got to a positive result with each scenario. My result is showing me two numbers the way I want, alright. However, if I run the process a second time the numbers do not change (no random new number is generated) - the same numbers are the output of every new instance of the process.

What operator or work-around is to be used in order to get new random numbers for each time I run the process anew?

Many thanks in advance!

Roman

↧

Looping through an example to generate more examples

March 21, 2018, 12:15 pm

≫ Next: Loop Files Operator runs forever

≪ Previous: Generate a Random Number (New Number With Each Instance)

My data is confidential I cannot post it.

Here is an example of what my data looks like:

I would like to transform it into:

Please help me do this.

↧

Loop Files Operator runs forever

March 21, 2018, 2:20 pm

≫ Next: Sentiment Analysis for documents-Beginner needs assistance

≪ Previous: Looping through an example to generate more examples

Hi All,

Trying to use loop file operator on TripAdvisor data from process mentioned here: https://community.rapidminer.com/t5/RapidMiner-Studio-Knowledge-Base/Synonym-Detection-with-Word2Vec/ta-p/43860.

However, even with 5 files in the directory, the process is stuck at loop files operator. Any suggestions on what's wrong?

Thanks!

↧

Sentiment Analysis for documents-Beginner needs assistance

March 23, 2018, 6:46 am

≫ Next: Can I and how do I use a correlation matrix for categorical variables?

≪ Previous: Loop Files Operator runs forever

Hello, I am very new to rapid miner and have searched for my question but can only find feedback regarding tweets. My problem is I have run my process (see below) and come up with question marks for the polarity and subjectivity and I'm not entirely sure why. I am analyzing the abstract of approximately 480 articles (approximately three-sentence to a paragraph long text documents). Below is the code and I am also attaching a screenshot of the results I am receiving. Any help will be greatly appreciated!

<?xml version="1.0" encoding="UTF-8"?><process version="8.1.001"><operator activated="true" class="read_excel" compatibility="8.1.000" expanded="true" height="68" name="Read Excel" width="90" x="45" y="85"><parameter key="excel_file" value="C:\Users\Dara\Desktop\dara\Digitization\Data.xlsx"/><parameter key="sheet_selection" value="sheet number"/><parameter key="sheet_number" value="1"/><parameter key="imported_cell_range" value="A1:J483"/><parameter key="encoding" value="SYSTEM"/><parameter key="first_row_as_names" value="false"/><list key="annotations"><parameter key="0" value="Name"/></list><parameter key="date_format" value=""/><parameter key="time_zone" value="SYSTEM"/><parameter key="locale" value="English (United States)"/><parameter key="read_all_values_as_polynominal" value="false"/><list key="data_set_meta_data_information"><parameter key="0" value="Number.true.integer.attribute"/><parameter key="1" value="Article.true.text.attribute"/><parameter key="2" value="Date.true.date_time.attribute"/><parameter key="3" value="Title.true.text.attribute"/><parameter key="4" value="Address.true.text.attribute"/><parameter key="5" value="Type.true.text.attribute"/><parameter key="6" value="Abstract.true.text.label"/><parameter key="7" value="Country.true.text.attribute"/><parameter key="8" value="Keywords.true.text.attribute"/><parameter key="9" value="Publication.true.text.attribute"/></list><parameter key="read_not_matching_values_as_missings" value="true"/><parameter key="datamanagement" value="double_array"/><parameter key="data_management" value="auto"/></operator></process><?xml version="1.0" encoding="UTF-8"?><process version="8.1.001"><operator activated="true" class="nominal_to_text" compatibility="8.1.001" expanded="true" height="82" name="Nominal to Text" width="90" x="112" y="238"><parameter key="attribute_filter_type" value="all"/><parameter key="attribute" value=""/><parameter key="attributes" value=""/><parameter key="use_except_expression" value="false"/><parameter key="value_type" value="nominal"/><parameter key="use_value_type_exception" value="false"/><parameter key="except_value_type" value="file_path"/><parameter key="block_type" value="single_value"/><parameter key="use_block_type_exception" value="false"/><parameter key="except_block_type" value="single_value"/><parameter key="invert_selection" value="false"/><parameter key="include_special_attributes" value="false"/></operator></process><?xml version="1.0" encoding="UTF-8"?><process version="8.1.001"><operator activated="true" class="com.aylien.textapi.rapidminer:aylien_sentiment" compatibility="0.2.000" expanded="true" height="68" name="Analyze Sentiment" width="90" x="380" y="238"><parameter key="connection" value="AylienDara"/><parameter key="input_attribute" value="Article"/><parameter key="sentiment_mode" value="document"/><parameter key="Is input URL" value="false"/></operator></process>

↧

Can I and how do I use a correlation matrix for categorical variables?

March 23, 2018, 6:42 pm

≫ Next: Comma values - best predictive model suggestion

≪ Previous: Sentiment Analysis for documents-Beginner needs assistance

Hello everyone,

I'm new to RapidMiner, so I apologise in advance for all the silly questions that I ask.

For a project that I am doing for uni, I have a dataset that contains both categorical and numerical variables. We are supposed to choose Predictors to predict our label "recommended" which is a binominal variable.

First of all, in addition to the >0.5 corrleation rule, can I choose my predictors based on the attribute weights in the AttributeWeight Table? How do I interpret this weight table? Why are the values contradicting with the correlation values?

Second, can I use categorical variables for my correlation matrix? If I can, how do I transform my categorical variables into dummy variables so that I can use them in the matrix? I know about the Nominal to Numerical Operator but I am not sure if that is the correct way to go because I am getting only negative correlations! (thats 14 attributes negatively correlated to Recommended) Is that normal?

Thanks a TON.

↧

Comma values - best predictive model suggestion

March 24, 2018, 9:11 am

≫ Next: Facebook saying RapidMiner is not a registered app

≪ Previous: Can I and how do I use a correlation matrix for categorical variables?

Hello, I would like to know what is the best predictive operator to solve my problem.

So my main goal is to find the best predictive operator which could handle comma values (Input 1, Input 2, Input 3) and values in "Input 4" column (YES/NO) and predict values in "Output" column. I already tried Optimize Parameters(Evolutionary) with operators like [SVM kernel types: (radial, anova, epachenikov etc), Random Forest, Neural Net, Decision Tree] to find optimal values, but the average accuracy of prediction was low.

I would really like to know if someone could suggest some predictive operator which could produce the best accuracy for this kind of problem.

NOTE: The values in columns "Input 1, Input 2, Input 3" after adding up always equals 100% in every row if this helps.

Thanks very much in advance.

↧

Facebook saying RapidMiner is not a registered app

March 25, 2018, 5:05 am

≫ Next: how to apply smote upsampling

≪ Previous: Comma values - best predictive model suggestion

When requesting an o-auth token from Facebook using Facebook extension I get the followiong response:

App not set up: This app is still in development mode, and you don't have access to it. Switch to a registered test user or ask an app admin for permissions.

↧

how to apply smote upsampling

March 26, 2018, 3:59 am

≫ Next: Nominal to Date

≪ Previous: Facebook saying RapidMiner is not a registered app

hello.. sorry i am new in data mining i have project on classification loan default and my data is imbalanced ..

where i apply smote upsampling before spilt the data or after? my data is not larg only 1030 sample

↧

Nominal to Date

March 26, 2018, 8:37 am

≫ Next: about rapidminer studio.

≪ Previous: how to apply smote upsampling

Dear Rapidminer Community

I am struggling again with some Nominal to Date conversions and I hope you can help me. I am extracting with an "extract macro" a date value and I am getting a nominal value. Now I want to convert this nominal value from the macro again back to a date value.

The nominal format is:

May 22, 2016

How can I convert this into date format? All I get is: "Cannot parse date" and I have tried already plenty of options.

Any help would be greatly appreciated!

Best regards

Felix

↧

about rapidminer studio.

March 26, 2018, 9:16 am

≫ Next: Which algorithm is the operator "neural net" trained with ?

≪ Previous: Nominal to Date

what are the classifier or algorithm used at the backend of rapidminer for preprocessing the data.

↧

Which algorithm is the operator "neural net" trained with ?

March 27, 2018, 2:56 am

≫ Next: Where is the predicted result of training set using neural net?

≪ Previous: about rapidminer studio.

As is shown in the description, the operator"deep learning" is based on a multi-layer feed-forward artificial neural network that is trained with stochastic gradient descent using back-propagation.

I use the operator "neural net", and i want to train the net with some algorithms, such as Stochastic Gradient Descent, Levenberg-Marquardt and Bayesian Regularization. How to do?

Does the operator "neural net" only use back-propagation without any optimization method?

↧

Where is the predicted result of training set using neural net?

March 27, 2018, 3:53 am

≫ Next: [Example] Sentiment Analysis

≪ Previous: Which algorithm is the operator "neural net" trained with ?

I use the operator "Neural Net" to make a prediction. I split the ExampleSet into training set and testing set . I can easily find the prediction of testing set , however, where is the prediction of training set, i also want the result of training.

Thank you!

Jeffery

↧

[Example] Sentiment Analysis

March 15, 2018, 8:58 am

≫ Next: Handle error message "Too few examples"

≪ Previous: Where is the predicted result of training set using neural net?

Hello All,

Enjoy this data set and model example of sentiment analysis from our team of data scientists here at RapidMiner.

Cheers!

↧

Handle error message "Too few examples"

March 28, 2018, 10:13 am

≫ Next: Process Documents from Files: Include all subdirectories

≪ Previous: [Example] Sentiment Analysis

Dear Rapidminer Community,

I have divided my process in several sub-processes and implemented in those processes are several filters. Those filters filter certain values in a certain time frame (e.g. 01.01.2015 - 20.04.2015). Sometimes the filters do not deliver any value, which triggers the error message "Too few examples". As this information is not important for me, I would like to "catch" this error message and the process should jump to the next sub-process(!) without any message/interruption. Is this possible to do? I would like to catch all of these error messages and simply let the process run until the end.

I hope it is clear what I want to do Smiley Wink

Best regards

Felix

↧

Process Documents from Files: Include all subdirectories

March 28, 2018, 4:02 pm

≫ Next: Mutation prob of the operator "Optimize Parameters (Evolutionary)"

≪ Previous: Handle error message "Too few examples"

Hi all,

In the parameters box of "Process Documents from Files" I can set the "text directories".

I have a lot of html files on local hard drive, in a folder called "webpage", with many subfolders (~100). Too many subfolders to add them all separately. I am missing a checkbox that enables to include all the subfolders. Is there a way I can achieve this? I have created a CSV file with columns "class name" and "directory". Would be great if i could import this.

Cheers, Roger

↧

Mutation prob of the operator "Optimize Parameters (Evolutionary)"

March 28, 2018, 6:57 pm

≫ Next: change attributes order

≪ Previous: Process Documents from Files: Include all subdirectories

Hi,

when i use the operator "Optimize Parameters (Evolutionary)", the parameters include "mutation type", "selection type", "crossover prob".........

However, where can i set the value of "mutation prob"? Or can you tell me the default value is ?????

↧